***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................op name ................ ................ installed ................installed..installed installed compatible.. .. --------------------------------------------------.. compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]cpu_adam cpu_adam cpu_adam .............................. ............... [YES][YES]fused_adam[YES] ............................... [OKAY][OKAY][OKAY] [YES] ...... [OKAY] fused_lambfused_adam fused_adamfused_adam.......................... ..........................[YES] [YES] [YES]............ [YES] [OKAY] ......[OKAY]...... fused_lamb [OKAY].............[OKAY] [YES] ......fused_lamb [OKAY]fused_lambsparse_attn ............. .........................[YES] [NO][YES]...... ....... ......[OKAY][OKAY] sparse_attn [OKAY] transformer ............ ............[NO] [YES]....... ......[OKAY] [OKAY] transformersparse_attn stochastic_transformer............ sparse_attn ............[YES] . ............ ......[NO] [YES] [OKAY]....... ...... [NO] [OKAY] [OKAY] stochastic_transformer....... transformer . [OKAY] ............[YES] ......[YES]transformer [OKAY].................. [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- ................op name................op name installed installed................................ .. .. installed installedcompatible compatible..--------------------------------------------------.. --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... cpu_adam......cpu_adam[YES] [OKAY].................................... [OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [YES] ................... fused_adam [YES]fused_adam ......[OKAY] ............. ............. [OKAY] [YES] [YES] fused_lamb...... ......fused_lamb .............[OKAY] [YES]............. [OKAY] ...... [YES] fused_lamb[OKAY]...... fused_lamb [OKAY] ............. ............. [YES][YES] ............ [OKAY][OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] sparse_attn[OKAY].......sparse_attn ............[OKAY]............ transformer [NO]transformer............[NO] ............[YES].............. [OKAY][YES]......[OKAY] [OKAY]...... transformer[OKAY]transformer stochastic_transformer ............ ............ .stochastic_transformer [YES] [YES][YES] . ...... ............ [YES] [OKAY][OKAY]......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[YES] ......[YES] [OKAY]...... [OKAY] fused_lamb .............fused_lamb [YES]............. ......[YES] [OKAY]...... [OKAY] sparse_attn ............sparse_attn [NO] ................... [NO][OKAY] ....... [OKAY] transformer ............transformer [YES]............ ......[YES] [OKAY]...... [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ..................ninja.................................... [OKAY][OKAY][OKAY].................. --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op name op name ................op name................ ................................installedinstalled installed .. installed .. .. compatiblecompatible .. compatible--------------------------------------------------compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES][YES]............... ............... ...... ......[YES][YES][OKAY] ...... ......[OKAY][OKAY] [OKAY] fused_adam .............fused_adam fused_adam [YES] fused_adam............. ............. ................... [YES] [YES][OKAY]......[YES] ...... [OKAY]...... [OKAY][OKAY] fused_lambfused_lamb fused_lamb.............fused_lamb............. .............[YES].............[YES] ......[YES][YES] ......[OKAY] ............ [OKAY] [OKAY] [OKAY] sparse_attnsparse_attnsparse_attnsparse_attn ................................................ [NO] [NO][NO] [NO] ..................... .......[OKAY][OKAY][OKAY] [OKAY] transformer ............transformer [YES] transformertransformer............ ...... ............[YES]............[OKAY] [YES] [YES]...... ......[OKAY]......stochastic_transformer [OKAY][OKAY] stochastic_transformer. .[YES] stochastic_transformer [YES] stochastic_transformer ....... ...... .[YES] [OKAY] [OKAY][YES] ...... ...... [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- op name................................op name installed installed................ ................ ....installed compatible installed.. compatible --------------------------------------------------..compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adam cpu_adam [YES] [OKAY]..................... ...............[YES] [OKAY] [YES]...... ......[OKAY]fused_adam [OKAY].............fused_adam [YES]............. fused_adam ......[YES] [OKAY]............. ......fused_adam fused_lamb [OKAY] .............[YES] ............. [YES]......fused_lamb[YES] ...... ...................[OKAY][OKAY] [YES] [OKAY]......fused_lamb [OKAY]............. fused_lamb[YES] ................... sparse_attn [YES] [OKAY] ............ ...... [NO]sparse_attn[OKAY] ....... ............ [OKAY][NO] ....... sparse_attntransformer[OKAY] ........................ [YES]transformer[NO] sparse_attn .................. ....... [OKAY][YES]............ [OKAY] ......[NO]stochastic_transformer transformer [OKAY]........ ............ [OKAY][YES] [YES]stochastic_transformer...... .......transformer[OKAY] [YES] [OKAY].................. [YES][OKAY] stochastic_transformer...... . [OKAY][YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop nameop nameop name ................................................ ................ installed installedinstalled installed...... .. compatiblecompatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam...............cpu_adam ..............................[YES] ...............[YES]......[YES] [OKAY][YES]...... ...... ......[OKAY][OKAY] [OKAY] fused_adam fused_adam.............fused_adam .............fused_adam[YES]............. ...................[YES][YES] [OKAY] ......[YES] ...... [OKAY]......[OKAY] [OKAY] fused_lamb .............fused_lambfused_lamb fused_lamb[YES] ................................ ............. [YES] [OKAY] [YES][YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attnsparse_attn [NO] .................................... .......[NO][NO][NO] .......[OKAY] .............. [OKAY] [OKAY][OKAY] transformer transformer transformer............transformer............ ............[YES][YES] ............[YES]............ [YES] [OKAY] ...... ......[OKAY] [OKAY][OKAY] stochastic_transformer .stochastic_transformer stochastic_transformerstochastic_transformer[YES] . .......[YES]. ......[YES][OKAY][YES] [OKAY] ...... ...... [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY] -------------------------------------------------- [OKAY] -------------------------------------------------- -------------------------------------------------- op name -------------------------------------------------- op nameop name................ installed................................ op name installed ..installed................ .. ..compatibleinstalled compatible--------------------------------------------------.. --------------------------------------------------compatible compatible -------------------------------------------------- --------------------------------------------------cpu_adam ............... [YES] cpu_adam...... ............... [OKAY]cpu_adam[YES]cpu_adam ..................... [OKAY][YES]............... ......fused_adam [YES][OKAY] .............fused_adam [YES] ............. ............ fused_adam[OKAY][YES] [OKAY]................... [YES][OKAY]fused_lamb ................... [OKAY][YES] fused_lamb ...................fused_lamb [YES][OKAY] ............. ...... [YES][OKAY] ...... [OKAY] sparse_attnfused_adam ............ [NO] ....... [OKAY] sparse_attn ............. sparse_attn............ ............[NO] transformer[YES][NO]....... ............ [OKAY] ............. [YES] [OKAY][OKAY] transformer .................. transformer[YES] [OKAY]..................fused_lamb [OKAY] ............. [YES]stochastic_transformerstochastic_transformer[YES] ........ [OKAY] ......[YES][YES] [OKAY]............ stochastic_transformer [OKAY] [OKAY] . [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................op name................ ................ installed................ installed installed ..installed.. compatible....compatible --------------------------------------------------compatible-------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adamcpu_adam[YES] cpu_adam ............... ...... ............... ...............[OKAY][YES] [YES][YES]...... ............[OKAY] [OKAY][OKAY] fused_adam ............. [YES] ...... [OKAY] fused_adamfused_adam fused_lamb............. fused_adam [YES].......................... [YES][YES]................... ............[OKAY][YES] [OKAY] [OKAY]...... [OKAY]fused_lamb fused_lamb............. .............[YES]fused_lamb ......[YES].............sparse_attn [OKAY]......[YES] ............ ......[OKAY][NO] [OKAY]....... [OKAY] transformersparse_attn ........................ sparse_attn [YES][NO] ...... sparse_attn................... [OKAY][NO] [OKAY]................... stochastic_transformer[NO][OKAY] transformer ........ ............[YES] transformer [OKAY][YES] ...... ............ ...... transformer[OKAY] [OKAY] [YES]............ ......[YES] [OKAY]...... stochastic_transformer[OKAY] stochastic_transformer . .[YES] stochastic_transformer[YES]...... .......[OKAY] [OKAY][YES] ...... [OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- ................................op name op name installedinstalled ................ .................. ..installed installed compatiblecompatible .. .. ---------------------------------------------------------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES]cpu_adam[YES] cpu_adam ........................... [OKAY][OKAY]...............[YES] ......[YES] [OKAY]...... [OKAY]fused_adamfused_adam .......................... [YES][YES] ............fused_adam [OKAY].............[OKAY] fused_adam [YES] ................... fused_lamb[OKAY] [YES]fused_lamb ............. ...................[YES] [OKAY][YES]...... fused_lamb [OKAY]................... [OKAY]fused_lamb[YES] ................... [OKAY][YES] ...... [OKAY]sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] .......sparse_attntransformer ............ [OKAY]sparse_attn[NO]............ ...................[YES] transformer[NO][OKAY] ......................... transformer[OKAY][YES] [OKAY] ..................transformer stochastic_transformer[OKAY][YES]............ .......[YES] stochastic_transformer[YES] [OKAY] ...... ....... [OKAY][YES][OKAY] stochastic_transformer...... .stochastic_transformer [OKAY] [YES] . ......[YES] [OKAY]...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op nameop name ................ ................ ................installed................ installed..installedinstalled compatible.... .. -------------------------------------------------- compatible compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam .....................cpu_adam cpu_adam [OKAY][YES]............... .....................[YES] [OKAY][YES]...... [OKAY]fused_adam ...... .............[OKAY] [YES] ......fused_adam [OKAY]............. fused_adam[YES] fused_lamb.............fused_adam [YES] ...... ................................ [YES][OKAY][YES][OKAY] ............fused_lamb [OKAY][OKAY]............. fused_lamb [YES].............fused_lamb ......[YES]............. [OKAY]......[YES] sparse_attn [OKAY] .................. [OKAY][NO] ....... [OKAY] transformersparse_attn ........................ sparse_attn[YES] [NO] sparse_attn............ ...... [NO] ...................[OKAY] .......[OKAY][NO] [OKAY]stochastic_transformer....... transformer [OKAY].transformer............ [YES] ............[YES]transformer ......[YES].................. [OKAY] ......[OKAY][YES] [OKAY]...... stochastic_transformer[OKAY] . stochastic_transformer[YES] .stochastic_transformer...... [YES].[OKAY] ......[YES] [OKAY] ...... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- op name---------------------------------------------------------------------------------------------------- op name ................ op name................op nameinstalled installed.................................. installed compatible.. installed .. compatible--------------------------------------------------.. compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... cpu_adam [OKAY] [YES] cpu_adam............... ......[YES]............... ......[OKAY]fused_adam [YES] ............. [OKAY]......[YES] ......[OKAY] fused_adam [OKAY] ............. [YES] ......fused_adam fused_lamb [OKAY] ............. ............. fused_adam[YES][YES] fused_lamb......................... ............. [OKAY][YES][OKAY] [YES] ...... ......[OKAY]fused_lamb [OKAY]............. [YES] fused_lamb...... [OKAY].............sparse_attn ............[YES] [NO]......sparse_attn .......[OKAY]............ sparse_attn [OKAY]............[NO] [NO]....... transformer ....... [OKAY] ............ [OKAY] [YES]transformer ......transformer............ ............[OKAY]sparse_attn [YES] [YES] ...... stochastic_transformer .................. [OKAY] .[OKAY] [NO][YES] stochastic_transformerstochastic_transformer............. ..[OKAY][OKAY] [YES][YES] ............transformer [OKAY] [OKAY] ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name installed................................................ ..installed installedinstalled compatible .. .. ..--------------------------------------------------compatible compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... cpu_adamcpu_adam [OKAY] [YES]............... ............... ...... [YES][YES][OKAY] ............ fused_adam[OKAY][OKAY] ............. [YES] ......fused_adam [OKAY]............. [YES] fused_adamfused_adam...... fused_lamb[OKAY].......................... .............[YES] [YES] [YES] fused_lamb............ ...................[OKAY] [OKAY] [OKAY] [YES] ...... [OKAY] fused_lambfused_lamb .......................... [YES][YES] ............ sparse_attn[OKAY][OKAY] ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer sparse_attn............transformer [YES]sparse_attn............ .................. [YES] [NO]............ [OKAY]............. [NO] [OKAY] [OKAY] ....... stochastic_transformer [OKAY]stochastic_transformer.transformer .[YES]transformer ............ ......[YES] ............ [OKAY][YES] ...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY][OKAY] -------------------------------------------------- [OKAY] ----------------------------------------------------------------------------------------------------op name ................op nameop name -------------------------------------------------- ................................ installed installed..op name installed compatible .................. .. --------------------------------------------------installed compatiblecompatible ..-------------------------------------------------- -------------------------------------------------- compatible --------------------------------------------------cpu_adam ............... [YES] cpu_adamcpu_adam...... [OKAY] ............... cpu_adam ...............[YES] ...............[YES]...... [OKAY] [YES] ......fused_adam ......[OKAY]............. [OKAY][YES] ......fused_adam [OKAY]............. [YES] fused_adam...... fused_lamb [OKAY]fused_adam ............. ............. ............. [YES][YES] fused_lamb [YES]...... ................... ......[OKAY][OKAY] [YES] ......[OKAY]fused_lamb [OKAY] .............fused_lamb [YES]............. ......[YES] sparse_attn[OKAY]...... ............[OKAY] sparse_attn [NO] ................... [OKAY][NO] .......transformer [OKAY]sparse_attn............ [YES]............transformersparse_attn ......[NO]........................ [YES]....... [OKAY]......[OKAY] [OKAY][NO] transformerstochastic_transformer ....... ............ . stochastic_transformer[OKAY] [YES] .[YES] ......transformer[YES]...... ............ [OKAY][OKAY] ...... [YES][OKAY] stochastic_transformer ...... .[OKAY] [YES] ...... [OKAY]stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY][OKAY] transformer_inference .. [NO] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................ ................ installed installed installedinstalled .. ....compatible.. compatible compatible-------------------------------------------------- compatible utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam ............... [OKAY].............................. [YES][YES][YES] .................. fused_adam[OKAY][OKAY][OKAY] ............. [YES] ...... [OKAY] fused_adamfused_adam fused_lambfused_adam ............. .......................... ............. [YES][YES] [YES] [YES] ...... ............ ...... [OKAY][OKAY] [OKAY] [OKAY] fused_lamb fused_lambfused_lamb............. ..........................[YES] [YES]sparse_attn...... [YES] ......[OKAY]............ ......[OKAY][NO] [OKAY]....... [OKAY] transformer ............ sparse_attn[YES] sparse_attn ............ ...... ............ sparse_attn[OKAY][NO] [NO]................... stochastic_transformer[OKAY][NO]....... .[OKAY]....... transformer [YES][OKAY]............ transformer ......[YES] transformer ............ [OKAY]...... ............ [YES] [OKAY] [YES] ...... ...... [OKAY]stochastic_transformer[OKAY] . [YES] ......stochastic_transformer stochastic_transformer [OKAY].. [YES][YES] ............ [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................ ................ ................installedinstalledinstalled ....installed .. compatiblecompatible.. compatible--------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ............... ..............................[YES][YES] ...... ......[YES] [YES] [OKAY][OKAY] ............ [OKAY][OKAY] fused_adam fused_adam............. fused_adam fused_adam............. [YES] .......................... [YES] ...... [YES][YES] ......[OKAY]............ [OKAY][OKAY][OKAY] fused_lamb ............. fused_lambfused_lamb[YES]fused_lamb ............. ................... .............[YES][YES][OKAY] [YES]............ ......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn sparse_attn ............transformer ........................ ............[NO][NO] [NO] [YES].............. ....... [OKAY] ......[OKAY] [OKAY] [OKAY]transformer transformertransformer............ [YES]........................ stochastic_transformer [YES] ...... [YES]. ...... ...... [OKAY][YES] [OKAY][OKAY]...... [OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer ..[YES] [YES][YES]...... ............[OKAY] [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op name................ op name................ installedinstalled................................ ....installed installedcompatible compatible.. ..--------------------------------------------------compatible-------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam .............................. [YES] [YES][YES]............... ..................[YES] [OKAY]...... [OKAY] [OKAY] [OKAY] fused_adam ............. [YES] fused_adamfused_adam...... fused_adam .......................... [OKAY] [YES] ............. [YES] ...... [YES] fused_lamb[OKAY]...... ...... [OKAY].............fused_lamb [YES][OKAY]............. fused_lamb ...... [YES] ............. [OKAY]...... fused_lamb [YES][OKAY]............. ......[YES] [OKAY]...... [OKAY]sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............ [NO]transformer sparse_attn...................sparse_attn [OKAY] [YES]........................ ......[NO][NO]transformer [OKAY] .......................... stochastic_transformer[OKAY][YES][OKAY] ....... transformer transformer[YES] [OKAY] .................. ............ [OKAY][YES][YES] stochastic_transformer ............. [OKAY][YES][OKAY] ...... [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [OKAY] utils .................. [YES] ...... [OKAY] async_io ...............quantizer async_io [NO] .............. ......................[NO] [NO] ....... [NO][OKAY] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] transformer_inference utils.. .................. [YES][NO] ...... .......[OKAY] [OKAY] quantizer .............. [NO]utils ......................... [OKAY] [YES] ...... [OKAY]-------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] transformer_inference .. [NO] ....... [OKAY] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 -------------------------------------------------- deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... DeepSpeed general environment info:torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.2 1.8.2 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info deepspeed info................... ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_iotransformer_inference .. [NO] ...................... [OKAY][NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizertransformer_inference .............. ..[NO] ....... [OKAY] [NO] ....... [OKAY]-------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found........ [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................ [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. transformer_inference[YES] ........ [NO][OKAY] ....... [OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] [WARNING]  async_io: please install the libaio-devel package with yum utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY]async_io ............... [NO] .......utils ..................[NO] [YES] ...... [OKAY] quantizertransformer_inference ................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................ [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]transformer_inference .. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]utils .................. [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ...........['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ...................deepspeed info ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.1 nvcc version ..................... 11.2 deepspeed install path ........... 11.1 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']nvcc version .....................deepspeed info 11.2................... 0.5.5+58a8e13, 58a8e13, masterdeepspeed install path deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.2.................... 1.8.2 torch cuda version ...............torch cuda version 11.1............... 11.1nvcc version .....................nvcc version 11.2..................... 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info ...................deepspeed info 0.5.5+58a8e13, 58a8e13, master................... 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.2 1.8.2 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path ...........deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info deepspeed info................... ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'].................... 1.8.2 torch version torch cuda version.................... ...............1.8.2 11.1 torch cuda versionnvcc version .................................... 11.111.2 nvcc versiondeepspeed install path ................................ 11.2 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed install path deepspeed info........... ................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']0.5.5+58a8e13, 58a8e13, master deepspeed infodeepspeed wheel compiled w. ......................... 0.5.5+58a8e13, 58a8e13, mastertorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version .................... 1.8.2torch version .................... torch cuda version1.8.2 ............... 11.1torch cuda version ...............nvcc version 11.1..................... 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ...........deepspeed info ...................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 0.5.5+58a8e13, 58a8e13, masterdeepspeed info ...................deepspeed wheel compiled w. 0.5.5+58a8e13, 58a8e13, master...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ....... [OKAY] async_ioasync_io .............................. [NO][NO] .............. [NO][NO] utils .................. [YES] ...... [OKAY] async_io ...............quantizer [NO].............. ....... [NO] [NO]....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils .................. [YES] ...... [OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] DeepSpeed general environment info:deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch install pathtorch 1.8, cuda 11.1 ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1795509.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/tr6-1B3-prefix-lm-unbiased-loss-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > setting tensorboard ... **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-11-01 17:38:42,115] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data' g++ -O3 -Wall -shared -std=c++11 -fPIC -fdiagnostics-color -I/gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -I/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/pybind11/include helpers.cpp -o helpers.cpython-38-x86_64-linux-gnu.so make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 6.236 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF scaled_upper_triang_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -o scaled_upper_triang_masked_softmax.o [2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_upper_triang_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -o scaled_upper_triang_masked_softmax_cuda.cuda.o [3/3] c++ scaled_upper_triang_masked_softmax.o scaled_upper_triang_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_upper_triang_masked_softmax_cuda.so Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF scaled_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -o scaled_masked_softmax.o [2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -o scaled_masked_softmax_cuda.cuda.o /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced [3/3] c++ scaled_masked_softmax.o scaled_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_masked_softmax_cuda.so Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF layer_norm_cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -o layer_norm_cuda.o [2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -maxrregcount=50 -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o [3/3] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 162.296 seconds time to initialize megatron (seconds): 186.731 [after megatron is initialized] datetime: 2021-11-01 17:41:30 building GPT model ... [2021-11-01 17:41:30,781] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-01 17:41:30,782] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-01 17:41:30,782] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.57 GB, percent = 21.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-11-01 17:41:31,304] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=11 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 105739264 [2021-11-01 17:41:31,677] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-01 17:41:31,678] [INFO] [utils.py:807:see_memory_usage] MA 0.21 GB Max_MA 0.21 GB CA 0.22 GB Max_CA 0 GB [2021-11-01 17:41:31,678] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.82 GB, percent = 21.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 105739264 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-01 17:41:31,697] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-01 17:41:31,770] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-01 17:41:31,770] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-01 17:41:31,770] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-01 17:41:31,771] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-01 17:41:31,772] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-01 17:41:31,772] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-01 17:41:31,772] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-01 17:41:31,772] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-01 17:41:31,772] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-01 17:41:31,772] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 39 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 33 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 46 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 32 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 37 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 45 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 44 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 28 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 35 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 36 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 47 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 43 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 41 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 30 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 31 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 29 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 21 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 23 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 22 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 16 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 17 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 20 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 34 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 24 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 26 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 27 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 40 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 25 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 18 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 19 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 38 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 42 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 15 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 60 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 52 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 13 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 1 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 2 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 14 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 48 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 51 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 11 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 55 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 9 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 10 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 56 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 59 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 5 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 6 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 50 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 63 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 3 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 49 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 57 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 7 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 61 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 53 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 4 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 0 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 8 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 54 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 62 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 58 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 12 partition count [4, 4] and sizes[(26411008, False), (23808, False)] [2021-11-01 17:41:32,076] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-01 17:41:32,077] [INFO] [utils.py:807:see_memory_usage] MA 0.3 GB Max_MA 0.35 GB CA 0.59 GB Max_CA 1 GB [2021-11-01 17:41:32,077] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.54 GB, percent = 22.2% [2021-11-01 17:41:32,104] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-01 17:41:32,104] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB Max_MA 0.59 GB CA 0.89 GB Max_CA 1 GB [2021-11-01 17:41:32,105] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.54 GB, percent = 22.2% [2021-11-01 17:41:32,105] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-01 17:41:32,128] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-01 17:41:32,129] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB Max_MA 0.49 GB CA 0.89 GB Max_CA 1 GB [2021-11-01 17:41:32,129] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.54 GB, percent = 22.2% [2021-11-01 17:41:32,129] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-01 17:41:32,129] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-01 17:41:32,129] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-01 17:41:32,129] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-01 17:41:32,129] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-01 17:41:32,129] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-01 17:41:32,129] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] amp_params ................... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] dump_state ................... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-01 17:41:32,130] [INFO] [config.py:944:print] pld_params ................... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 8 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] world_size ................... 4 [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-01 17:41:32,131] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-01 17:41:32,131] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-01 17:41:32,132] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,423] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=51 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=49 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=48 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=50 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,424] [INFO] [engine.py:151:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints will not load any checkpoints and will start from random [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,509] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-01 17:41:32,510] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. time (ms) | load-checkpoint: 1.12 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.69189376estimated model parameters: 1.69189376 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376estimated model parameters: 1.69189376 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224estimated model parameters: 1.691828224estimated model parameters: 1.691828224estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters: 1.69189376estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-01 17:41:32 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.143529 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.422 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.328 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.073 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-01 17:41:39 done with setup ... training ... time (ms) | model-and-optimizer-setup: 1777.43 | train/valid/test-data-iterators-setup: 5796.00 Number of parameters: 1.691828224 billion Number of parameters: 1.691828224 billionNumber of parameters: 1.691828224 billion Number of parameters: 1.209483264 billion Number of parameters: 1.69189376 billion Number of parameters: 1.209483264 billion Number of parameters: 1.69189376 billionNumber of parameters: 1.69189376 billion Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.691828224 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.69189376 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billionNumber of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-11-01 17:41:39 [2021-11-01 17:41:39,970] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-01 17:41:39,970] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-01 17:41:39,970] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-01 17:41:39,970] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-01 17:41:39,970] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 17] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3814.0 | max reserved: 3814.0 [Rank 1] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4126.0 | max reserved: 4126.0 [Rank 49] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 5210.0 | max reserved: 5210.0 [Rank 19] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0 [Rank 3] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4238.0 | max reserved: 4238.0 [Rank 51] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 6210.0 | max reserved: 6210.0 [Rank 35] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0 [Rank 48] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 6194.0 | max reserved: 6194.0 [Rank 2] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4238.0 | max reserved: 4238.0 [Rank 18] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0 [Rank 50] (after 200 iterations) memory (MB) | allocated: 1472.916015625 | max allocated: 3314.6748046875 | reserved: 5338.0 | max reserved: 5338.0 [Rank 34] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3862.0 | max reserved: 3862.0 iteration 200/ 152972 | consumed samples: 6400 | consumed tokens: 13107200 | elapsed time per iteration (ms): 1327.3 | learning rate: 6.991E-06 | global batch size: 32 | lm loss: 8.590840E+00 | loss scale: 4096.0 | grad norm: 8688.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 16] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3894.0 | max reserved: 3894.0 [Rank 0] (after 200 iterations) memory (MB) | allocated: 552.95849609375 | max allocated: 2609.5048828125 | reserved: 4222.0 | max reserved: 4222.0 [Rank 33] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3894.0 | max reserved: 3894.0 [Rank 32] (after 200 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 2448.73974609375 | reserved: 3814.0 | max reserved: 3814.0 iteration 400/ 152972 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (ms): 1267.2 | learning rate: 1.398E-05 | global batch size: 32 | lm loss: 7.146135E+00 | loss scale: 4096.0 | grad norm: 4570.467 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 600/ 152972 | consumed samples: 19200 | consumed tokens: 39321600 | elapsed time per iteration (ms): 1278.3 | learning rate: 2.097E-05 | global batch size: 32 | lm loss: 6.750924E+00 | loss scale: 8192.0 | grad norm: 7901.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 800/ 152972 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (ms): 1276.6 | learning rate: 2.796E-05 | global batch size: 32 | lm loss: 6.545265E+00 | loss scale: 8192.0 | grad norm: 7316.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 1000/ 152972 | consumed samples: 32000 | consumed tokens: 65536000 | elapsed time per iteration (ms): 1276.7 | learning rate: 3.495E-05 | global batch size: 32 | lm loss: 6.260021E+00 | loss scale: 16384.0 | grad norm: 7908.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 1000 | lm loss value: 6.375450E+00 | lm loss PPL: 5.872494E+02 | ------------------------------------------------------------------------------------------------ iteration 1200/ 152972 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (ms): 1447.9 | learning rate: 4.194E-05 | global batch size: 32 | lm loss: 6.374382E+00 | loss scale: 16384.0 | grad norm: 8613.462 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 1400/ 152972 | consumed samples: 44800 | consumed tokens: 91750400 | elapsed time per iteration (ms): 1279.1 | learning rate: 4.893E-05 | global batch size: 32 | lm loss: 6.279016E+00 | loss scale: 16384.0 | grad norm: 14029.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 18:14:19,464] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/mp_rank_00_model_states.pt [2021-11-01 18:14:19,477] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/mp_rank_01_model_states.pt [2021-11-01 18:14:19,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 18:14:19,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 18:14:19,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 18:14:19,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 18:14:19,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 18:14:19,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 18:14:19,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 18:14:19,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 18:14:19,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 18:14:19,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 18:14:19,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 18:14:19,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 18:14:19,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 18:14:19,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 18:14:19,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 18:14:19,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 18:14:19,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 18:14:19,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 18:14:19,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 18:14:19,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 18:14:19,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 18:14:19,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 18:14:19,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 18:14:19,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 18:14:19,888] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 18:14:19,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 18:14:19,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 18:14:19,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 18:14:19,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 18:14:19,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 18:14:19,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 18:14:19,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 18:14:19,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 18:14:19,993] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 18:14:20,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 18:14:20,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 18:14:20,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 18:14:20,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-01 18:14:20,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 18:14:20,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 18:14:20,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 18:14:20,019] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 18:14:20,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 18:14:20,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 18:14:20,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 18:14:20,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 18:14:20,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 18:14:20,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 18:14:20,033] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-01 18:14:20,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 18:14:20,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 18:14:20,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 18:14:20,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 18:14:20,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 18:14:20,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-01 18:14:20,053] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 18:14:20,054] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 18:14:20,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 18:14:20,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-01 18:14:20,061] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 18:14:20,063] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_13_optim_states.pt successfully saved checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1206.98 iteration 1600/ 152972 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (ms): 1299.0 | learning rate: 5.592E-05 | global batch size: 32 | lm loss: 6.404619E+00 | loss scale: 32768.0 | grad norm: 21620.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 1800/ 152972 | consumed samples: 57600 | consumed tokens: 117964800 | elapsed time per iteration (ms): 1286.6 | learning rate: 6.291E-05 | global batch size: 32 | lm loss: 5.957126E+00 | loss scale: 32768.0 | grad norm: 19930.566 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-01 18:25:04,050] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=0, lr=[6.990524562409547e-05, 6.990524562409547e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 2000 loss: 5.5920 iter time (s): 0.001 samples/sec: 50642.485 iteration 2000/ 152972 | consumed samples: 64000 | consumed tokens: 131072000 | elapsed time per iteration (ms): 1284.4 | learning rate: 6.991E-05 | global batch size: 32 | lm loss: 5.872133E+00 | loss scale: 65536.0 | grad norm: 24708.158 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 2000 | lm loss value: 5.820941E+00 | lm loss PPL: 3.372893E+02 | ------------------------------------------------------------------------------------------------ iteration 2200/ 152972 | consumed samples: 70400 | consumed tokens: 144179200 | elapsed time per iteration (ms): 1456.7 | learning rate: 7.690E-05 | global batch size: 32 | lm loss: 5.789141E+00 | loss scale: 65536.0 | grad norm: 44878.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 2400/ 152972 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (ms): 1288.4 | learning rate: 8.385E-05 | global batch size: 32 | lm loss: 5.601571E+00 | loss scale: 65536.0 | grad norm: 55717.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 2600/ 152972 | consumed samples: 83200 | consumed tokens: 170393600 | elapsed time per iteration (ms): 1298.1 | learning rate: 9.081E-05 | global batch size: 32 | lm loss: 5.861208E+00 | loss scale: 32768.0 | grad norm: 28652.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 2800/ 152972 | consumed samples: 89600 | consumed tokens: 183500800 | elapsed time per iteration (ms): 1292.5 | learning rate: 9.780E-05 | global batch size: 32 | lm loss: 5.622683E+00 | loss scale: 32768.0 | grad norm: 21679.658 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3000/ 152972 | consumed samples: 96000 | consumed tokens: 196608000 | elapsed time per iteration (ms): 1297.3 | learning rate: 1.048E-04 | global batch size: 32 | lm loss: 5.453251E+00 | loss scale: 32768.0 | grad norm: 13109.309 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 3000 | lm loss value: 5.292018E+00 | lm loss PPL: 1.987441E+02 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 18:47:43,971] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/mp_rank_01_model_states.pt [2021-11-01 18:47:44,023] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/mp_rank_00_model_states.pt [2021-11-01 18:47:44,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 18:47:44,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 18:47:44,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 18:47:44,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 18:47:44,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 18:47:44,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 18:47:44,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 18:47:44,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 18:47:44,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 18:47:44,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 18:47:44,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 18:47:44,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 18:47:44,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 18:47:44,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 18:47:44,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 18:47:44,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 18:47:44,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 18:47:44,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 18:47:44,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 18:47:44,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 18:47:44,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 18:47:44,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 18:47:44,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 18:47:44,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 18:47:44,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 18:47:44,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 18:47:44,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 18:47:44,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 18:47:44,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 18:47:44,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 18:47:44,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 18:47:44,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 18:47:44,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 18:47:44,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 18:47:44,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-01 18:47:44,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 18:47:44,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 18:47:44,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 18:47:44,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-01 18:47:44,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 18:47:44,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 18:47:44,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-01 18:47:44,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 18:47:44,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 18:47:44,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 18:47:44,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 18:47:44,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 18:47:44,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 18:47:44,552] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 18:47:44,552] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 18:47:44,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 18:47:44,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 18:47:44,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 18:47:44,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 18:47:44,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-01 18:47:44,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 18:47:44,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 18:47:44,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 18:47:44,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 18:47:44,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 18:47:44,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 18:47:44,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 18:47:44,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 18:47:44,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1185.16 iteration 3200/ 152972 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (ms): 1462.9 | learning rate: 1.118E-04 | global batch size: 32 | lm loss: 5.273650E+00 | loss scale: 65536.0 | grad norm: 38824.414 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3400/ 152972 | consumed samples: 108800 | consumed tokens: 222822400 | elapsed time per iteration (ms): 1286.4 | learning rate: 1.188E-04 | global batch size: 32 | lm loss: 4.597713E+00 | loss scale: 65536.0 | grad norm: 79233.728 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3600/ 152972 | consumed samples: 115200 | consumed tokens: 235929600 | elapsed time per iteration (ms): 1296.7 | learning rate: 1.258E-04 | global batch size: 32 | lm loss: 3.693162E+00 | loss scale: 131072.0 | grad norm: 103393.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3800/ 152972 | consumed samples: 121600 | consumed tokens: 249036800 | elapsed time per iteration (ms): 1291.4 | learning rate: 1.327E-04 | global batch size: 32 | lm loss: 3.533896E+00 | loss scale: 131072.0 | grad norm: 74243.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-01 19:09:15,794] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=4, lr=[0.00013967068075694273, 0.00013967068075694273], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 4000 loss: 1.7512 iter time (s): 0.001 samples/sec: 50460.970 iteration 4000/ 152972 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (ms): 1288.4 | learning rate: 1.397E-04 | global batch size: 32 | lm loss: 3.362072E+00 | loss scale: 65536.0 | grad norm: 18497.022 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 4000 | lm loss value: 3.268240E+00 | lm loss PPL: 2.626507E+01 | ------------------------------------------------------------------------------------------------ iteration 4200/ 152972 | consumed samples: 135456 | consumed tokens: 277413888 | elapsed time per iteration (ms): 1508.5 | learning rate: 1.478E-04 | global batch size: 64 | lm loss: 3.377852E+00 | loss scale: 65536.0 | grad norm: 28091.974 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 4400/ 152972 | consumed samples: 148256 | consumed tokens: 303628288 | elapsed time per iteration (ms): 1604.1 | learning rate: 1.618E-04 | global batch size: 64 | lm loss: 3.219010E+00 | loss scale: 65536.0 | grad norm: 16919.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 19:22:21,049] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/mp_rank_01_model_states.pt [2021-11-01 19:22:21,100] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/mp_rank_00_model_states.pt [2021-11-01 19:22:21,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 19:22:21,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 19:22:21,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 19:22:21,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 19:22:21,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 19:22:21,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 19:22:21,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 19:22:21,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 19:22:21,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 19:22:21,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 19:22:21,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 19:22:21,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 19:22:21,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 19:22:21,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 19:22:21,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 19:22:21,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 19:22:21,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 19:22:21,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 19:22:21,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 19:22:21,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 19:22:21,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 19:22:21,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 19:22:21,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 19:22:21,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 19:22:21,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 19:22:21,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 19:22:21,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 19:22:21,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 19:22:21,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 19:22:21,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 19:22:21,509] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 19:22:21,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 19:22:21,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 19:22:21,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 19:22:21,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 19:22:21,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-01 19:22:21,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 19:22:21,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 19:22:21,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 19:22:21,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 19:22:21,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-01 19:22:21,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-01 19:22:21,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 19:22:21,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 19:22:21,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 19:22:21,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 19:22:21,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 19:22:21,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 19:22:21,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 19:22:21,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 19:22:21,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 19:22:21,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 19:22:21,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 19:22:21,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 19:22:21,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 19:22:21,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 19:22:21,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 19:22:21,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 19:22:21,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 19:22:21,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-01 19:22:21,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 19:22:21,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 19:22:21,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 19:22:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1112.04 iteration 4600/ 152972 | consumed samples: 161056 | consumed tokens: 329842688 | elapsed time per iteration (ms): 1616.3 | learning rate: 1.756E-04 | global batch size: 64 | lm loss: 3.361848E+00 | loss scale: 32768.0 | grad norm: 34305.181 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 4800/ 152972 | consumed samples: 173856 | consumed tokens: 356057088 | elapsed time per iteration (ms): 1607.5 | learning rate: 1.895E-04 | global batch size: 64 | lm loss: 3.095694E+00 | loss scale: 32768.0 | grad norm: 8177.898 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5000/ 152972 | consumed samples: 186656 | consumed tokens: 382271488 | elapsed time per iteration (ms): 1603.2 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 3.158372E+00 | loss scale: 32768.0 | grad norm: 8572.169 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 5000 | lm loss value: 3.023918E+00 | lm loss PPL: 2.057174E+01 | ------------------------------------------------------------------------------------------------ iteration 5200/ 152972 | consumed samples: 199456 | consumed tokens: 408485888 | elapsed time per iteration (ms): 1830.6 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 3.131141E+00 | loss scale: 65536.0 | grad norm: 17721.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5400/ 152972 | consumed samples: 212256 | consumed tokens: 434700288 | elapsed time per iteration (ms): 1602.8 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 3.115537E+00 | loss scale: 65536.0 | grad norm: 14115.718 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5600/ 152972 | consumed samples: 225056 | consumed tokens: 460914688 | elapsed time per iteration (ms): 1599.5 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.995638E+00 | loss scale: 131072.0 | grad norm: 41343.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5800/ 152972 | consumed samples: 237856 | consumed tokens: 487129088 | elapsed time per iteration (ms): 1594.3 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 3.052323E+00 | loss scale: 131072.0 | grad norm: 159959.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-01 20:03:09,034] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=9, lr=[0.00019999960451637578, 0.00019999960451637578], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 6000/ 152972 | consumed samples: 250656 | consumed tokens: 513343488 | elapsed time per iteration (ms): 1599.6 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 3.011719E+00 | loss scale: 65536.0 | grad norm: 15847.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 6000 loss: 2.7031 iter time (s): 0.001 samples/sec: 81062.888 ------------------------------------------------------------------------------------------------ validation loss at iteration 6000 | lm loss value: 2.982520E+00 | lm loss PPL: 1.973749E+01 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 20:03:54,444] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/mp_rank_00_model_states.pt [2021-11-01 20:03:54,452] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/mp_rank_01_model_states.pt [2021-11-01 20:03:54,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 20:03:54,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 20:03:54,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 20:03:54,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 20:03:54,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 20:03:54,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 20:03:54,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 20:03:54,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 20:03:54,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 20:03:54,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 20:03:54,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 20:03:54,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 20:03:54,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 20:03:54,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 20:03:54,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 20:03:54,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 20:03:54,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 20:03:54,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 20:03:54,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 20:03:54,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 20:03:54,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 20:03:54,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 20:03:54,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 20:03:54,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 20:03:54,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 20:03:54,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 20:03:54,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 20:03:54,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 20:03:54,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 20:03:54,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 20:03:54,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 20:03:54,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 20:03:54,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 20:03:54,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 20:03:54,971] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 20:03:54,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 20:03:54,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 20:03:54,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-01 20:03:54,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-01 20:03:54,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 20:03:54,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-01 20:03:54,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 20:03:54,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-01 20:03:54,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 20:03:54,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 20:03:54,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 20:03:54,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 20:03:54,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 20:03:55,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 20:03:55,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 20:03:55,004] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 20:03:55,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 20:03:55,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 20:03:55,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 20:03:55,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 20:03:55,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 20:03:55,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 20:03:55,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 20:03:55,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 20:03:55,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 20:03:55,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 20:03:55,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 20:03:55,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 20:03:55,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_12_optim_states.pt successfully saved checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1141.83 iteration 6200/ 152972 | consumed samples: 263456 | consumed tokens: 539557888 | elapsed time per iteration (ms): 1853.8 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.901359E+00 | loss scale: 65536.0 | grad norm: 15147.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 6400/ 152972 | consumed samples: 281024 | consumed tokens: 575537152 | elapsed time per iteration (ms): 5426.9 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.894193E+00 | loss scale: 131072.0 | grad norm: 32218.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 6600/ 152972 | consumed samples: 300224 | consumed tokens: 614858752 | elapsed time per iteration (ms): 1904.6 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.819356E+00 | loss scale: 131072.0 | grad norm: 20253.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 6800/ 152972 | consumed samples: 319424 | consumed tokens: 654180352 | elapsed time per iteration (ms): 1902.2 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.835417E+00 | loss scale: 131072.0 | grad norm: 25013.229 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 7000/ 152972 | consumed samples: 338624 | consumed tokens: 693501952 | elapsed time per iteration (ms): 1902.6 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.845205E+00 | loss scale: 262144.0 | grad norm: 54172.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 7000 | lm loss value: 2.723926E+00 | lm loss PPL: 1.524004E+01 | ------------------------------------------------------------------------------------------------ iteration 7200/ 152972 | consumed samples: 357824 | consumed tokens: 732823552 | elapsed time per iteration (ms): 2188.8 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.742239E+00 | loss scale: 262144.0 | grad norm: 42485.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 7400/ 152972 | consumed samples: 377024 | consumed tokens: 772145152 | elapsed time per iteration (ms): 1899.3 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.788103E+00 | loss scale: 524288.0 | grad norm: 88162.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 21:03:14,781] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/mp_rank_00_model_states.pt [2021-11-01 21:03:14,812] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/mp_rank_01_model_states.pt [2021-11-01 21:03:15,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 21:03:15,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 21:03:15,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 21:03:15,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 21:03:15,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 21:03:15,213] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 21:03:15,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 21:03:15,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 21:03:15,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 21:03:15,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 21:03:15,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 21:03:15,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 21:03:15,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 21:03:15,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 21:03:15,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 21:03:15,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 21:03:15,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 21:03:15,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 21:03:15,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 21:03:15,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 21:03:15,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 21:03:15,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 21:03:15,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 21:03:15,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 21:03:15,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 21:03:15,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 21:03:15,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 21:03:15,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 21:03:15,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 21:03:15,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 21:03:15,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 21:03:15,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 21:03:15,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-01 21:03:15,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 21:03:15,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 21:03:15,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-01 21:03:15,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 21:03:15,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 21:03:15,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 21:03:15,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 21:03:15,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 21:03:15,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 21:03:15,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 21:03:15,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 21:03:15,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 21:03:15,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 21:03:15,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 21:03:15,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 21:03:15,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 21:03:15,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 21:03:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 21:03:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 21:03:15,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-01 21:03:15,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 21:03:15,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 21:03:15,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 21:03:15,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-01 21:03:15,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 21:03:15,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 21:03:15,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 21:03:15,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1102.90 iteration 7600/ 152972 | consumed samples: 396224 | consumed tokens: 811466752 | elapsed time per iteration (ms): 1902.5 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.754191E+00 | loss scale: 131072.0 | grad norm: 24945.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 7800/ 152972 | consumed samples: 420544 | consumed tokens: 861274112 | elapsed time per iteration (ms): 2150.3 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.751569E+00 | loss scale: 131072.0 | grad norm: 21880.873 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-01 21:20:57,637] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=12, lr=[0.0001999939570800071, 0.0001999939570800071], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 8000/ 152972 | consumed samples: 446144 | consumed tokens: 913702912 | elapsed time per iteration (ms): 2212.0 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.788481E+00 | loss scale: 131072.0 | grad norm: 22323.475 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 8000 loss: 2.8012 iter time (s): 0.001 samples/sec: 115833.903 ------------------------------------------------------------------------------------------------ validation loss at iteration 8000 | lm loss value: 2.689925E+00 | lm loss PPL: 1.473057E+01 | ------------------------------------------------------------------------------------------------ iteration 8200/ 152972 | consumed samples: 471744 | consumed tokens: 966131712 | elapsed time per iteration (ms): 2546.5 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.704028E+00 | loss scale: 262144.0 | grad norm: 48355.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 8400/ 152972 | consumed samples: 497344 | consumed tokens: 1018560512 | elapsed time per iteration (ms): 2205.6 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.684679E+00 | loss scale: 131072.0 | grad norm: 22705.576 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 8600/ 152972 | consumed samples: 522944 | consumed tokens: 1070989312 | elapsed time per iteration (ms): 2206.3 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.687691E+00 | loss scale: 65536.0 | grad norm: 9859.402 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 8800/ 152972 | consumed samples: 552320 | consumed tokens: 1131151360 | elapsed time per iteration (ms): 2387.2 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.648720E+00 | loss scale: 65536.0 | grad norm: 10153.437 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9000/ 152972 | consumed samples: 584320 | consumed tokens: 1196687360 | elapsed time per iteration (ms): 2527.2 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.595457E+00 | loss scale: 65536.0 | grad norm: 8486.364 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------ validation loss at iteration 9000 | lm loss value: 2.544508E+00 | lm loss PPL: 1.273696E+01 | ------------------------------------------------------------------------------------------------ saving checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 22:01:50,427] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/mp_rank_00_model_states.pt [2021-11-01 22:01:50,465] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/mp_rank_01_model_states.pt [2021-11-01 22:01:50,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 22:01:50,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 22:01:50,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 22:01:50,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 22:01:50,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 22:01:50,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 22:01:50,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 22:01:50,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 22:01:50,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 22:01:50,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 22:01:50,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 22:01:50,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 22:01:50,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 22:01:50,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 22:01:50,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 22:01:50,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 22:01:50,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 22:01:50,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 22:01:50,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 22:01:50,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 22:01:50,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 22:01:50,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 22:01:50,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 22:01:50,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 22:01:50,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 22:01:50,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 22:01:50,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 22:01:50,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 22:01:50,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 22:01:50,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 22:01:50,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 22:01:50,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 22:01:50,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 22:01:50,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 22:01:50,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 22:01:50,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-01 22:01:50,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-01 22:01:50,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 22:01:50,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 22:01:50,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 22:01:50,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 22:01:50,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 22:01:50,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 22:01:50,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 22:01:50,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 22:01:50,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 22:01:50,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 22:01:50,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 22:01:50,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 22:01:50,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 22:01:50,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 22:01:50,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 22:01:50,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 22:01:50,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 22:01:51,000] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 22:01:51,000] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 22:01:51,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-01 22:01:51,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 22:01:51,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 22:01:51,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 22:01:51,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 22:01:51,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-01 22:01:51,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 22:01:51,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_15_optim_states.pt successfully saved checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1087.59 iteration 9200/ 152972 | consumed samples: 616320 | consumed tokens: 1262223360 | elapsed time per iteration (ms): 2939.1 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.584026E+00 | loss scale: 131072.0 | grad norm: 18164.939 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9400/ 152972 | consumed samples: 648320 | consumed tokens: 1327759360 | elapsed time per iteration (ms): 2554.0 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.627375E+00 | loss scale: 131072.0 | grad norm: 18730.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9600/ 152972 | consumed samples: 683040 | consumed tokens: 1398865920 | elapsed time per iteration (ms): 2682.6 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 2.597666E+00 | loss scale: 262144.0 | grad norm: 32685.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9800/ 152972 | consumed samples: 721440 | consumed tokens: 1477509120 | elapsed time per iteration (ms): 2857.6 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 2.547980E+00 | loss scale: 262144.0 | grad norm: 33598.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-01 22:46:50,240] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=16, lr=[0.00019997091981206023, 0.00019997091981206023], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 10000 loss: 2.6857 iter time (s): 0.001 samples/sec: 135371.169 iteration 10000/ 152972 | consumed samples: 759840 | consumed tokens: 1556152320 | elapsed time per iteration (ms): 2856.9 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 2.548814E+00 | loss scale: 262144.0 | grad norm: 33355.704 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 10000 | lm loss value: 2.537272E+00 | lm loss PPL: 1.264513E+01 | ------------------------------------------------------------------------------------------------- iteration 10200/ 152972 | consumed samples: 798240 | consumed tokens: 1634795520 | elapsed time per iteration (ms): 3293.9 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 2.511054E+00 | loss scale: 524288.0 | grad norm: 72033.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 10400/ 152972 | consumed samples: 842720 | consumed tokens: 1725890560 | elapsed time per iteration (ms): 3153.9 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 2.503509E+00 | loss scale: 262144.0 | grad norm: 32736.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-01 23:13:37,746] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/mp_rank_01_model_states.pt [2021-11-01 23:13:37,769] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/mp_rank_00_model_states.pt [2021-11-01 23:13:38,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-01 23:13:38,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-01 23:13:38,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-01 23:13:38,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-01 23:13:38,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-01 23:13:38,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-01 23:13:38,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-01 23:13:38,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-01 23:13:38,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-01 23:13:38,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-01 23:13:38,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-01 23:13:38,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-01 23:13:38,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-01 23:13:38,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-01 23:13:38,144] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-01 23:13:38,145] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-01 23:13:38,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-01 23:13:38,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-01 23:13:38,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-01 23:13:38,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-01 23:13:38,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-01 23:13:38,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-01 23:13:38,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-01 23:13:38,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-01 23:13:38,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-01 23:13:38,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-01 23:13:38,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-01 23:13:38,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-01 23:13:38,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-01 23:13:38,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-01 23:13:38,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-01 23:13:38,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-01 23:13:38,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-01 23:13:38,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-01 23:13:38,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-01 23:13:38,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-01 23:13:38,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-01 23:13:38,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-01 23:13:38,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-01 23:13:38,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-01 23:13:38,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-01 23:13:38,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-01 23:13:38,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-01 23:13:38,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-01 23:13:38,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-01 23:13:38,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-01 23:13:38,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-01 23:13:38,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-01 23:13:38,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-01 23:13:38,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-01 23:13:38,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-01 23:13:38,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-01 23:13:38,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-01 23:13:38,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-01 23:13:38,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-01 23:13:38,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-01 23:13:38,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-01 23:13:38,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-01 23:13:38,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-01 23:13:38,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-01 23:13:38,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1151.58 iteration 10600/ 152972 | consumed samples: 887520 | consumed tokens: 1817640960 | elapsed time per iteration (ms): 3175.6 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 2.470897E+00 | loss scale: 262144.0 | grad norm: 29970.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 10800/ 152972 | consumed samples: 932320 | consumed tokens: 1909391360 | elapsed time per iteration (ms): 3164.1 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 2.467487E+00 | loss scale: 131072.0 | grad norm: 16202.335 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11000/ 152972 | consumed samples: 983360 | consumed tokens: 2013921280 | elapsed time per iteration (ms): 3472.6 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.705436E+00 | loss scale: 131072.0 | grad norm: 23590.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 11000 | lm loss value: 2.527613E+00 | lm loss PPL: 1.252358E+01 | ------------------------------------------------------------------------------------------------- iteration 11200/ 152972 | consumed samples: 1034560 | consumed tokens: 2118778880 | elapsed time per iteration (ms): 4020.5 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 2.485469E+00 | loss scale: 262144.0 | grad norm: 29278.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11400/ 152972 | consumed samples: 1088128 | consumed tokens: 2228486144 | elapsed time per iteration (ms): 3587.0 | learning rate: 1.999E-04 | global batch size: 288 | lm loss: 2.469676E+00 | loss scale: 262144.0 | grad norm: 30222.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11600/ 152972 | consumed samples: 1145728 | consumed tokens: 2346450944 | elapsed time per iteration (ms): 3785.3 | learning rate: 1.999E-04 | global batch size: 288 | lm loss: 2.433655E+00 | loss scale: 262144.0 | grad norm: 29033.436 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11800/ 152972 | consumed samples: 1203680 | consumed tokens: 2465136640 | elapsed time per iteration (ms): 3816.4 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 2.421053E+00 | loss scale: 524288.0 | grad norm: 54629.737 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-02 00:45:27,945] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=19, lr=[0.00019989706888811533, 0.00019989706888811533], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 12000/ 152972 | consumed samples: 1267680 | consumed tokens: 2596208640 | elapsed time per iteration (ms): 4119.3 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 2.395784E+00 | loss scale: 524288.0 | grad norm: 56777.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 12000 loss: 2.6347 iter time (s): 0.002 samples/sec: 154942.719 ------------------------------------------------------------------------------------------------- validation loss at iteration 12000 | lm loss value: 2.391201E+00 | lm loss PPL: 1.092661E+01 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 00:47:41,437] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/mp_rank_01_model_states.pt [2021-11-02 00:47:41,476] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/mp_rank_00_model_states.pt [2021-11-02 00:47:41,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 00:47:41,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 00:47:41,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 00:47:41,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 00:47:41,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 00:47:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 00:47:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 00:47:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 00:47:41,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 00:47:41,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 00:47:41,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 00:47:41,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 00:47:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 00:47:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 00:47:41,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 00:47:41,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 00:47:41,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 00:47:41,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 00:47:41,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 00:47:41,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 00:47:41,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 00:47:41,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 00:47:41,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 00:47:41,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 00:47:41,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 00:47:41,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 00:47:41,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 00:47:41,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 00:47:41,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 00:47:41,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 00:47:41,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 00:47:41,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 00:47:41,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 00:47:41,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 00:47:41,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-02 00:47:41,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 00:47:41,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 00:47:41,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-02 00:47:41,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 00:47:41,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 00:47:41,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 00:47:41,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 00:47:41,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 00:47:41,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 00:47:41,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 00:47:41,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 00:47:41,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 00:47:42,004] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 00:47:42,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-02 00:47:42,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 00:47:42,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 00:47:42,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 00:47:42,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-02 00:47:42,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 00:47:42,016] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 00:47:42,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 00:47:42,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 00:47:42,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 00:47:42,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 00:47:42,019] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 00:47:42,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 00:47:42,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 00:47:42,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-02 00:47:42,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_15_optim_states.pt successfully saved checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1175.72 iteration 12200/ 152972 | consumed samples: 1331680 | consumed tokens: 2727280640 | elapsed time per iteration (ms): 4799.6 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 2.389797E+00 | loss scale: 1048576.0 | grad norm: 102065.995 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 12400/ 152972 | consumed samples: 1401888 | consumed tokens: 2871066624 | elapsed time per iteration (ms): 4448.0 | learning rate: 1.999E-04 | global batch size: 352 | lm loss: 2.393252E+00 | loss scale: 1048576.0 | grad norm: 104610.634 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 12600/ 152972 | consumed samples: 1472768 | consumed tokens: 3016228864 | elapsed time per iteration (ms): 4474.1 | learning rate: 1.999E-04 | global batch size: 384 | lm loss: 2.360604E+00 | loss scale: 1048576.0 | grad norm: 107325.199 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 12800/ 152972 | consumed samples: 1549568 | consumed tokens: 3173515264 | elapsed time per iteration (ms): 4778.3 | learning rate: 1.998E-04 | global batch size: 384 | lm loss: 2.363876E+00 | loss scale: 2097152.0 | grad norm: 213088.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 13000/ 152972 | consumed samples: 1628544 | consumed tokens: 3335258112 | elapsed time per iteration (ms): 4880.2 | learning rate: 1.998E-04 | global batch size: 416 | lm loss: 2.356655E+00 | loss scale: 2097152.0 | grad norm: 200363.789 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 13000 | lm loss value: 2.326353E+00 | lm loss PPL: 1.024052E+01 | ------------------------------------------------------------------------------------------------- iteration 13200/ 152972 | consumed samples: 1711744 | consumed tokens: 3505651712 | elapsed time per iteration (ms): 5944.0 | learning rate: 1.998E-04 | global batch size: 416 | lm loss: 2.333088E+00 | loss scale: 2097152.0 | grad norm: 187087.543 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 13400/ 152972 | consumed samples: 1799680 | consumed tokens: 3685744640 | elapsed time per iteration (ms): 5317.6 | learning rate: 1.998E-04 | global batch size: 448 | lm loss: 2.322860E+00 | loss scale: 1048576.0 | grad norm: 98604.665 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 13500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 02:49:55,953] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/mp_rank_01_model_states.pt [2021-11-02 02:49:55,987] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/mp_rank_00_model_states.pt [2021-11-02 02:49:56,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 02:49:56,352] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 02:49:56,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 02:49:56,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 02:49:56,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 02:49:56,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 02:49:56,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 02:49:56,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 02:49:56,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 02:49:56,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 02:49:56,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 02:49:56,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 02:49:56,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 02:49:56,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 02:49:56,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 02:49:56,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 02:49:56,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 02:49:56,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 02:49:56,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 02:49:56,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 02:49:56,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 02:49:56,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 02:49:56,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 02:49:56,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 02:49:56,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 02:49:56,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 02:49:56,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 02:49:56,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 02:49:56,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 02:49:56,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 02:49:56,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 02:49:56,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-02 02:49:56,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 02:49:56,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 02:49:56,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 02:49:56,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 02:49:56,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 02:49:56,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 02:49:56,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 02:49:56,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 02:49:56,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 02:49:56,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 02:49:56,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-02 02:49:56,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 02:49:56,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 02:49:56,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 02:49:56,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 02:49:56,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 02:49:56,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 02:49:56,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-02 02:49:56,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 02:49:56,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-02 02:49:56,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 02:49:56,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 02:49:56,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 02:49:56,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-02 02:49:56,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 02:49:56,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 02:49:56,535] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 02:49:56,535] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 02:49:56,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 13500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1128.47 iteration 13600/ 152972 | consumed samples: 1890880 | consumed tokens: 3872522240 | elapsed time per iteration (ms): 5483.7 | learning rate: 1.997E-04 | global batch size: 480 | lm loss: 2.318233E+00 | loss scale: 1048576.0 | grad norm: 93357.413 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 13800/ 152972 | consumed samples: 1986880 | consumed tokens: 4069130240 | elapsed time per iteration (ms): 5725.4 | learning rate: 1.997E-04 | global batch size: 480 | lm loss: 2.316036E+00 | loss scale: 262144.0 | grad norm: 25542.089 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-02 03:38:14,984] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=23, lr=[0.00019968259658442148, 0.00019968259658442148], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 14000/ 152972 | consumed samples: 2088384 | consumed tokens: 4277010432 | elapsed time per iteration (ms): 5984.3 | learning rate: 1.997E-04 | global batch size: 512 | lm loss: 2.304354E+00 | loss scale: 262144.0 | grad norm: 22072.428 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 14000 loss: 2.1247 iter time (s): 0.003 samples/sec: 169808.536 ------------------------------------------------------------------------------------------------- validation loss at iteration 14000 | lm loss value: 2.287617E+00 | lm loss PPL: 9.851433E+00 | ------------------------------------------------------------------------------------------------- iteration 14200/ 152972 | consumed samples: 2190784 | consumed tokens: 4486725632 | elapsed time per iteration (ms): 7030.1 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 3.456157E+00 | loss scale: 16384.0 | grad norm: 27750.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 14400/ 152972 | consumed samples: 2293184 | consumed tokens: 4696440832 | elapsed time per iteration (ms): 6019.1 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 2.471888E+00 | loss scale: 16384.0 | grad norm: 1553.035 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 14600/ 152972 | consumed samples: 2395584 | consumed tokens: 4906156032 | elapsed time per iteration (ms): 6023.3 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 2.308169E+00 | loss scale: 16384.0 | grad norm: 1517.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 14800/ 152972 | consumed samples: 2497984 | consumed tokens: 5115871232 | elapsed time per iteration (ms): 6021.9 | learning rate: 1.995E-04 | global batch size: 512 | lm loss: 2.292671E+00 | loss scale: 32768.0 | grad norm: 3138.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15000/ 152972 | consumed samples: 2600384 | consumed tokens: 5325586432 | elapsed time per iteration (ms): 6013.2 | learning rate: 1.995E-04 | global batch size: 512 | lm loss: 2.295139E+00 | loss scale: 32768.0 | grad norm: 3061.595 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 15000 | lm loss value: 2.267282E+00 | lm loss PPL: 9.653128E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 15000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 05:25:19,020] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/mp_rank_00_model_states.pt [2021-11-02 05:25:19,065] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/mp_rank_01_model_states.pt [2021-11-02 05:25:19,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 05:25:19,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 05:25:19,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 05:25:19,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 05:25:19,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 05:25:19,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 05:25:19,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 05:25:19,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 05:25:19,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 05:25:19,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 05:25:19,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 05:25:19,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 05:25:19,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 05:25:19,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 05:25:19,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 05:25:19,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 05:25:19,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 05:25:19,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 05:25:19,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 05:25:19,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 05:25:19,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 05:25:19,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 05:25:19,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 05:25:19,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 05:25:19,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 05:25:19,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 05:25:19,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 05:25:19,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 05:25:19,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 05:25:19,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 05:25:19,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 05:25:19,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 05:25:19,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-02 05:25:19,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 05:25:19,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 05:25:19,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 05:25:19,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 05:25:19,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-02 05:25:19,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 05:25:19,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 05:25:19,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 05:25:19,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 05:25:19,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 05:25:19,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 05:25:19,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 05:25:19,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 05:25:19,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 05:25:19,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 05:25:19,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 05:25:19,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 05:25:19,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 05:25:19,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 05:25:19,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 05:25:19,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-02 05:25:19,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 05:25:19,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 05:25:19,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 05:25:19,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 05:25:19,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-02 05:25:19,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 05:25:19,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 05:25:19,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-02 05:25:19,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 05:25:19,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_12_optim_states.pt successfully saved checkpoint at iteration 15000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1170.78 iteration 15200/ 152972 | consumed samples: 2702784 | consumed tokens: 5535301632 | elapsed time per iteration (ms): 7027.8 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 2.293391E+00 | loss scale: 65536.0 | grad norm: 5879.056 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15400/ 152972 | consumed samples: 2805184 | consumed tokens: 5745016832 | elapsed time per iteration (ms): 6020.1 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 2.270763E+00 | loss scale: 65536.0 | grad norm: 6939.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15600/ 152972 | consumed samples: 2907584 | consumed tokens: 5954732032 | elapsed time per iteration (ms): 6011.7 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 2.262162E+00 | loss scale: 65536.0 | grad norm: 5802.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15800/ 152972 | consumed samples: 3009984 | consumed tokens: 6164447232 | elapsed time per iteration (ms): 5994.6 | learning rate: 1.993E-04 | global batch size: 512 | lm loss: 2.254617E+00 | loss scale: 131072.0 | grad norm: 12598.649 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-02 07:05:26,628] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=27, lr=[0.00019925032117609708, 0.00019925032117609708], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 16000 loss: 2.1199 iter time (s): 0.003 samples/sec: 171120.084 iteration 16000/ 152972 | consumed samples: 3112384 | consumed tokens: 6374162432 | elapsed time per iteration (ms): 5996.5 | learning rate: 1.993E-04 | global batch size: 512 | lm loss: 2.264790E+00 | loss scale: 131072.0 | grad norm: 11840.026 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 16000 | lm loss value: 2.238354E+00 | lm loss PPL: 9.377885E+00 | ------------------------------------------------------------------------------------------------- iteration 16200/ 152972 | consumed samples: 3214784 | consumed tokens: 6583877632 | elapsed time per iteration (ms): 7006.8 | learning rate: 1.992E-04 | global batch size: 512 | lm loss: 2.260806E+00 | loss scale: 262144.0 | grad norm: 26196.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 16400/ 152972 | consumed samples: 3317184 | consumed tokens: 6793592832 | elapsed time per iteration (ms): 6025.1 | learning rate: 1.991E-04 | global batch size: 512 | lm loss: 2.253266E+00 | loss scale: 262144.0 | grad norm: 22496.516 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 07:58:57,103] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/mp_rank_00_model_states.pt [2021-11-02 07:58:57,152] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/mp_rank_01_model_states.pt [2021-11-02 07:58:57,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 07:58:57,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 07:58:57,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 07:58:57,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 07:58:57,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 07:58:57,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 07:58:57,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 07:58:57,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 07:58:57,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 07:58:57,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 07:58:57,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 07:58:57,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 07:58:57,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 07:58:57,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 07:58:57,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 07:58:57,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 07:58:57,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 07:58:57,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 07:58:57,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 07:58:57,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 07:58:57,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 07:58:57,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 07:58:57,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 07:58:57,544] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 07:58:57,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 07:58:57,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 07:58:57,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 07:58:57,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 07:58:57,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 07:58:57,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 07:58:57,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 07:58:57,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 07:58:57,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 07:58:57,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-02 07:58:57,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-02 07:58:57,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 07:58:57,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 07:58:57,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 07:58:57,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 07:58:57,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-02 07:58:57,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 07:58:57,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 07:58:57,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 07:58:57,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 07:58:57,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-02 07:58:57,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 07:58:57,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-02 07:58:57,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 07:58:57,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 07:58:57,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 07:58:57,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 07:58:57,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 07:58:57,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 07:58:57,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 07:58:57,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 07:58:57,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 07:58:57,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 07:58:57,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 07:58:57,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 07:58:57,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 07:58:57,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 07:58:57,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 07:58:57,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 07:58:57,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_03_optim_states.pt successfully saved checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1087.73 iteration 16600/ 152972 | consumed samples: 3419584 | consumed tokens: 7003308032 | elapsed time per iteration (ms): 6040.9 | learning rate: 1.991E-04 | global batch size: 512 | lm loss: 2.247239E+00 | loss scale: 262144.0 | grad norm: 24826.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 16800/ 152972 | consumed samples: 3521984 | consumed tokens: 7213023232 | elapsed time per iteration (ms): 6023.2 | learning rate: 1.990E-04 | global batch size: 512 | lm loss: 2.242061E+00 | loss scale: 524288.0 | grad norm: 43277.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17000/ 152972 | consumed samples: 3624384 | consumed tokens: 7422738432 | elapsed time per iteration (ms): 6021.0 | learning rate: 1.990E-04 | global batch size: 512 | lm loss: 2.212666E+00 | loss scale: 524288.0 | grad norm: 53420.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 17000 | lm loss value: 2.208052E+00 | lm loss PPL: 9.097976E+00 | ------------------------------------------------------------------------------------------------- iteration 17200/ 152972 | consumed samples: 3726784 | consumed tokens: 7632453632 | elapsed time per iteration (ms): 7048.9 | learning rate: 1.989E-04 | global batch size: 512 | lm loss: 2.228615E+00 | loss scale: 1048576.0 | grad norm: 92680.708 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17400/ 152972 | consumed samples: 3829184 | consumed tokens: 7842168832 | elapsed time per iteration (ms): 6018.9 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 2.239331E+00 | loss scale: 1048576.0 | grad norm: 94849.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17600/ 152972 | consumed samples: 3931584 | consumed tokens: 8051884032 | elapsed time per iteration (ms): 5998.8 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 2.226987E+00 | loss scale: 1048576.0 | grad norm: 100139.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17800/ 152972 | consumed samples: 4033984 | consumed tokens: 8261599232 | elapsed time per iteration (ms): 6035.2 | learning rate: 1.987E-04 | global batch size: 512 | lm loss: 2.213699E+00 | loss scale: 1048576.0 | grad norm: 87903.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-02 10:32:56,352] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=28, lr=[0.000198635005451171, 0.000198635005451171], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 18000 loss: 2.1605 iter time (s): 0.003 samples/sec: 170850.458 iteration 18000/ 152972 | consumed samples: 4136384 | consumed tokens: 8471314432 | elapsed time per iteration (ms): 6029.8 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 2.212738E+00 | loss scale: 1048576.0 | grad norm: 86888.361 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 18000 | lm loss value: 2.182227E+00 | lm loss PPL: 8.866028E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 10:36:24,657] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/mp_rank_01_model_states.pt [2021-11-02 10:36:24,690] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/mp_rank_00_model_states.pt [2021-11-02 10:36:25,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 10:36:25,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 10:36:25,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 10:36:25,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 10:36:25,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 10:36:25,055] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 10:36:25,055] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 10:36:25,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 10:36:25,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 10:36:25,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 10:36:25,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 10:36:25,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 10:36:25,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 10:36:25,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 10:36:25,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 10:36:25,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 10:36:25,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 10:36:25,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 10:36:25,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 10:36:25,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 10:36:25,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 10:36:25,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 10:36:25,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 10:36:25,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 10:36:25,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 10:36:25,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 10:36:25,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 10:36:25,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 10:36:25,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 10:36:25,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 10:36:25,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 10:36:25,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 10:36:25,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-02 10:36:25,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 10:36:25,185] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 10:36:25,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 10:36:25,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 10:36:25,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 10:36:25,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 10:36:25,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 10:36:25,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 10:36:25,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 10:36:25,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 10:36:25,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 10:36:25,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 10:36:25,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-02 10:36:25,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 10:36:25,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-02 10:36:25,221] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-02 10:36:25,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 10:36:25,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 10:36:25,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 10:36:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-02 10:36:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 10:36:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 10:36:25,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 10:36:25,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 10:36:25,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 10:36:25,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 10:36:25,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 10:36:25,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 10:36:25,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 10:36:25,245] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 10:36:25,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1114.76 iteration 18200/ 152972 | consumed samples: 4238784 | consumed tokens: 8681029632 | elapsed time per iteration (ms): 7066.9 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 2.219932E+00 | loss scale: 2097152.0 | grad norm: 195940.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 18400/ 152972 | consumed samples: 4341184 | consumed tokens: 8890744832 | elapsed time per iteration (ms): 6037.0 | learning rate: 1.985E-04 | global batch size: 512 | lm loss: 2.213413E+00 | loss scale: 524288.0 | grad norm: 41907.289 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 18600/ 152972 | consumed samples: 4443584 | consumed tokens: 9100460032 | elapsed time per iteration (ms): 6015.2 | learning rate: 1.984E-04 | global batch size: 512 | lm loss: 2.201817E+00 | loss scale: 524288.0 | grad norm: 45378.544 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 18800/ 152972 | consumed samples: 4545984 | consumed tokens: 9310175232 | elapsed time per iteration (ms): 6018.8 | learning rate: 1.983E-04 | global batch size: 512 | lm loss: 2.199431E+00 | loss scale: 1048576.0 | grad norm: 97265.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 19000/ 152972 | consumed samples: 4648384 | consumed tokens: 9519890432 | elapsed time per iteration (ms): 6017.8 | learning rate: 1.983E-04 | global batch size: 512 | lm loss: 2.184064E+00 | loss scale: 1048576.0 | grad norm: 84813.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 19000 | lm loss value: 2.160727E+00 | lm loss PPL: 8.677444E+00 | ------------------------------------------------------------------------------------------------- iteration 19200/ 152972 | consumed samples: 4750784 | consumed tokens: 9729605632 | elapsed time per iteration (ms): 7048.4 | learning rate: 1.982E-04 | global batch size: 512 | lm loss: 2.202084E+00 | loss scale: 1048576.0 | grad norm: 87504.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 19400/ 152972 | consumed samples: 4853184 | consumed tokens: 9939320832 | elapsed time per iteration (ms): 6028.4 | learning rate: 1.981E-04 | global batch size: 512 | lm loss: 2.198836E+00 | loss scale: 1048576.0 | grad norm: 93010.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 19500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 13:10:25,906] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/mp_rank_00_model_states.pt [2021-11-02 13:10:25,944] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/mp_rank_01_model_states.pt [2021-11-02 13:10:26,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 13:10:26,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 13:10:26,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 13:10:26,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 13:10:26,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 13:10:26,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 13:10:26,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 13:10:26,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 13:10:26,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 13:10:26,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 13:10:26,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 13:10:26,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 13:10:26,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 13:10:26,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 13:10:26,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 13:10:26,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 13:10:26,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 13:10:26,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 13:10:26,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 13:10:26,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 13:10:26,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 13:10:26,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 13:10:26,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 13:10:26,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 13:10:26,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 13:10:26,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 13:10:26,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 13:10:26,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 13:10:26,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 13:10:26,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 13:10:26,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 13:10:26,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 13:10:26,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 13:10:26,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-02 13:10:26,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 13:10:26,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 13:10:26,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 13:10:26,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 13:10:26,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-02 13:10:26,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-02 13:10:26,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-02 13:10:26,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 13:10:26,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 13:10:26,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 13:10:26,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 13:10:26,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 13:10:26,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 13:10:26,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 13:10:26,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 13:10:26,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 13:10:26,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 13:10:26,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-02 13:10:26,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 13:10:26,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 13:10:26,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 13:10:26,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 13:10:26,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 13:10:26,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 13:10:26,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 13:10:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 13:10:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 13:10:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 13:10:26,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 13:10:26,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 19500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1209.82 iteration 19600/ 152972 | consumed samples: 4955584 | consumed tokens: 10149036032 | elapsed time per iteration (ms): 6042.6 | learning rate: 1.980E-04 | global batch size: 512 | lm loss: 2.169471E+00 | loss scale: 524288.0 | grad norm: 46329.081 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 19679 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-02 13:28:28,379] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/mp_rank_00_model_states.pt [2021-11-02 13:28:28,423] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/mp_rank_01_model_states.pt [2021-11-02 13:28:28,776] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-02 13:28:28,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-02 13:28:28,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-02 13:28:28,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-02 13:28:28,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-02 13:28:28,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-02 13:28:28,785] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-02 13:28:28,786] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-02 13:28:28,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-02 13:28:28,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-02 13:28:28,790] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-02 13:28:28,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-02 13:28:28,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-02 13:28:28,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-02 13:28:28,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-02 13:28:28,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-02 13:28:28,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-02 13:28:28,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-02 13:28:28,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-02 13:28:28,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-02 13:28:28,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-02 13:28:28,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-02 13:28:28,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-02 13:28:28,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-02 13:28:28,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-02 13:28:28,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-02 13:28:28,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-02 13:28:28,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-02 13:28:28,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-02 13:28:28,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-02 13:28:28,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-02 13:28:28,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-02 13:28:28,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-02 13:28:28,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-02 13:28:28,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-02 13:28:28,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-02 13:28:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-02 13:28:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-02 13:28:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-02 13:28:28,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-02 13:28:28,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-02 13:28:28,919] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-02 13:28:28,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-02 13:28:28,922] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-02 13:28:28,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-02 13:28:28,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-02 13:28:28,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-02 13:28:28,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-02 13:28:28,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-02 13:28:28,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-02 13:28:28,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-02 13:28:28,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-02 13:28:28,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-02 13:28:28,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-02 13:28:28,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-02 13:28:28,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-02 13:28:28,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-02 13:28:28,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-02 13:28:28,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-02 13:28:28,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-02 13:28:28,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-02 13:28:28,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-02 13:28:28,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-02 13:28:28,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step19679/zero_pp_rank_3_mp_rank_12_optim_states.pt successfully saved checkpoint at iteration 19679 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1159.77 [exiting program after 1190.0665838718414 minutes] datetime: 2021-11-02 13:28:29 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system JIT compiled ops requires ninja meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ....................................[OKAY] [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name ninjaninjaninja ....................................ninja.................. [OKAY][OKAY]..................[OKAY] op name................................ op name ................ installed installed................ installed .. ..installed .. compatible compatible..compatible --------------------------------------------------compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- [OKAY] ---------------------------------------------------------------------------------------------------- op name cpu_adam cpu_adamcpu_adam............... cpu_adam ............... [YES] ............... [YES].....................[YES] ......[YES][OKAY]...... [OKAY]......[OKAY] [OKAY] --------------------------------------------------op nameop name................ fused_adamfused_adam .............fused_adamfused_adam............. [YES]..........................[YES] ......[YES][YES]...... [OKAY]............[OKAY] installed................op name................ .................. installed installedinstalledcompatible .. [OKAY][OKAY] .. --------------------------------------------------.. compatible compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- fused_lambfused_lamb .............fused_lambfused_lamb ............. [YES] .......................... [YES] ...... [YES] [YES]...... [OKAY] ...... ......[OKAY] [OKAY][OKAY] cpu_adam ............... [YES] ...... [OKAY]cpu_adamcpu_adam sparse_attn sparse_attn............ sparse_attn............[NO] ............sparse_attn[NO]....... [NO].......[OKAY]............ .......[OKAY][NO] [OKAY]....... transformer transformer [OKAY] ............transformer cpu_adam ............... ............... ............... [YES] [YES] [YES] ......fused_adam...... [OKAY]...................[OKAY] ............ [YES]............[YES]transformer [YES]........................ ......[OKAY][YES][OKAY] [OKAY]...... [YES][OKAY] ...... [OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer.. .[YES]stochastic_transformer[YES] [YES]............. [OKAY]......[YES][OKAY] [OKAY]...... [OKAY] fused_adamfused_lamb fused_adam............. fused_adam............. ............. [YES].............[YES] ......[YES][YES] ...... [OKAY] ......[OKAY]...... [OKAY]fused_lamb[OKAY] ............. [YES]fused_lamb fused_lamb ...... ............. sparse_attn............. [OKAY] [YES] ............[YES] ......[NO]...... [OKAY].......[OKAY] [OKAY] sparse_attntransformer ........................ [NO][YES] sparse_attnsparse_attn ....... .................. ............ [OKAY][OKAY] [NO] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] [NO].......transformer stochastic_transformer....... ............[OKAY]. [OKAY] [YES] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- [YES] transformertransformer...... ..................[OKAY]............ [YES][YES][OKAY] stochastic_transformer............ [OKAY][OKAY] op nameop name op name op name................ ................................ ................ installed installedinstalled installed .. .. .. ..compatible compatible compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- . [YES] stochastic_transformer...... stochastic_transformer.[OKAY] .[YES] [YES]...... ......[OKAY] [OKAY] cpu_adamcpu_adamcpu_adam ............................................. cpu_adam [YES] [YES][YES] ............... ...... ............ [YES] [OKAY] [OKAY][OKAY] ...... [OKAY] fused_adamfused_adam fused_adam ............. ............. ............. [YES] [YES] fused_adam[YES] ............ ...... ............. [OKAY][OKAY] [OKAY] [YES] ...... [OKAY]fused_lamb fused_lambfused_lamb............. ..........................[YES] [YES][YES]...... fused_lamb ...... ......[OKAY] ............. [OKAY] [OKAY] [YES] ...... [OKAY] sparse_attn sparse_attn............sparse_attn ............[NO]............ [NO]....... [NO]sparse_attn ....... [OKAY] ................... [OKAY] [OKAY] [NO] transformer .......transformer............ transformer [OKAY] ............[YES]............ [YES]......[YES] transformer ......[OKAY] ...... ............ [OKAY] [OKAY] [YES] ......stochastic_transformer stochastic_transformer[OKAY]. stochastic_transformer . [YES] . [YES] ...... [YES] stochastic_transformer...... [OKAY] [OKAY] ...... . [OKAY][YES] ...... [OKAY] ninjaninjaninjaninja ...................................................... [OKAY]..................[OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop nameop name ................op name................................ installed................installedinstalled ..installed.... compatible..compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam..............................cpu_adam [YES] ...............[YES] ............... ...... [YES]...... [YES] [OKAY] ......[OKAY] ...... [OKAY][OKAY] fused_adam fused_adam............. fused_adam............. [YES][YES].............fused_adam ......[YES]................... [OKAY][OKAY]......[YES] [OKAY]...... [OKAY]fused_lamb fused_lamb .......................... fused_lamb [YES] [YES]fused_lamb............. ............[YES]............. [OKAY][OKAY]......[YES] [OKAY]...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] sparse_attnsparse_attn ........................sparse_attn [NO] [NO]sparse_attn............ ..........................[NO] [OKAY][NO] [OKAY]....... .......transformer [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name transformer............ ............[YES]transformer transformer[YES].................. ..................[OKAY][YES] op name ................op name................................ ................ installedinstalledinstalled installed.... .. ..compatiblecompatible compatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- [YES]......[OKAY] ......[OKAY] stochastic_transformer[OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- .stochastic_transformer [YES]. ...... stochastic_transformer stochastic_transformer[YES] [OKAY] ........ [YES][YES][OKAY] ............ [OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installed installed installed ...... ..compatiblecompatiblecompatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adamcpu_adam............... ............... .............................. [YES] [YES] [YES][YES] ...... ............ ......[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. cpu_adam ...............[YES] [YES] ............... ...... [YES]...... [OKAY][YES] ............[OKAY] [OKAY][OKAY] fused_adam .............fused_adamfused_adam fused_adam[YES].......................... ......[YES][YES]............. ............[OKAY][YES] fused_adam ............. [YES] fused_adam...... fused_adamfused_adam ............. [OKAY] ............. [OKAY][OKAY] ...... fused_lamb[OKAY] [YES]............. [YES]......[YES] fused_lamb......[OKAY] ...... .............fused_lambfused_lamb [YES]............. .............fused_lamb......[YES] [OKAY][YES]................... ......[YES][OKAY] [OKAY]...... [OKAY] .............[OKAY] [OKAY][YES] fused_lamb...... fused_lambfused_lamb ............. [OKAY] .............[YES] sparse_attn ............ [NO] .......sparse_attn [OKAY]sparse_attn............ ............. ......[YES][YES] ......[OKAY]...... [OKAY] [OKAY] ............sparse_attn[NO]transformer [NO]............................... [OKAY].......[NO][YES] [OKAY]............. sparse_attn ............ [NO] ....... [OKAY] transformer[OKAY][OKAY] transformersparse_attn sparse_attn ........................ sparse_attn ............ [YES][NO]............ [NO] [NO]...... ....... .......[OKAY].......[OKAY] transformer............ transformer............[YES]stochastic_transformer ..................[YES]. [OKAY][YES][YES] [OKAY][OKAY] ...... ............[OKAY] [OKAY][OKAY]stochastic_transformer transformerstochastic_transformer transformertransformer ............ . ............[YES]............[YES] [YES]...... [YES]...... ...... [OKAY] ...... [OKAY] . stochastic_transformer[YES] stochastic_transformer....... . [OKAY][YES] [YES]...... ......[OKAY] [OKAY] [OKAY] [OKAY] stochastic_transformer .stochastic_transformer stochastic_transformer[YES] .. ...... [YES][YES][OKAY] ............ [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op nameop name................ op name ................ installed ..................................installed compatibleinstalledinstalled.. ..--------------------------------------------------compatible.. --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam.....................cpu_adam [OKAY]..............................[YES] [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] fused_adam ............. [YES] ...... [OKAY] fused_adamfused_adam fused_adamfused_lamb.......................... [YES].............[YES]............. [YES]............ [YES] ......[OKAY] [OKAY] [OKAY] ...... [OKAY] fused_lamb fused_lamb............. fused_lamb ............. [YES] .............[YES] ...... ......sparse_attn [YES] [OKAY][OKAY]............ ......[NO] .......[OKAY] [OKAY] transformer ............ [YES]sparse_attnsparse_attn ...... ........................ [OKAY] sparse_attn[NO] [NO] ................... stochastic_transformer.......[NO][OKAY] .[OKAY] .......transformer[YES] ..................[OKAY] transformer [YES][OKAY] ......transformer ............ [OKAY] ............ [YES] [YES]...... ......[OKAY] stochastic_transformer [OKAY] ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- . [YES] stochastic_transformer......stochastic_transformer [OKAY]. . [YES][YES] ............ [OKAY][OKAY] --------------------------------------------------op name ................op name installed .................. compatibleinstalled --------------------------------------------------.. compatible -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY] fused_adam ............. [YES] fused_adam...... .............[OKAY] [YES] ...... fused_lamb[OKAY] ............. [YES] fused_lamb...... [OKAY]............. [YES] ...... [OKAY] sparse_attn ............ [NO]sparse_attn ....... ............[OKAY] [NO] .......transformer ............[OKAY] [YES] ...... transformer[OKAY] ............ [YES] ......stochastic_transformer [OKAY]. [YES] ...... [OKAY]stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................op name ................ ................ installed................installed installed ..installed .. .. compatible.. compatible compatible compatible-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... cpu_adam[YES] cpu_adam...............cpu_adam ...... [YES] .............................. [OKAY] ......[YES] [YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [YES] fused_adam...... .............[OKAY] fused_adam[YES] ...................fused_adam fused_lamb [YES] [OKAY]............. ...................[YES] [OKAY][YES]......fused_lamb [OKAY]................... fused_lamb [OKAY] [YES] ............. fused_lamb ...... [YES] ............. [OKAY] ...... [YES] [OKAY]......sparse_attn [OKAY] ............ [NO] ....... sparse_attn[OKAY] ............ [NO]transformer ...................sparse_attn [OKAY][YES]............ sparse_attn ...... transformer............[NO] ............ [OKAY]....... [NO] [YES] [OKAY]....... ......stochastic_transformer[OKAY] transformer [OKAY] . ............ [YES]transformer[YES] stochastic_transformer........................ [YES][OKAY][OKAY]. ......[YES] stochastic_transformer......[OKAY] .[OKAY] [YES] ...... stochastic_transformer[OKAY] . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY] ..................[OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op nameop name ................ ................ ................................ installed installedinstalled.. installedcompatible ...... -------------------------------------------------- compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adam[OKAY] ...............cpu_adamcpu_adam ...............[YES] [YES]..................... fused_adam ...... [YES] [OKAY] [OKAY]............. ...... [YES][OKAY] ...... [OKAY] fused_adam ............. fused_adamfused_lamb[YES] fused_adam ............. ...... .......................... [YES] [OKAY] [YES] [YES]...... ............ fused_lamb [OKAY][OKAY] [OKAY]............. [YES] ......fused_lamb [OKAY]fused_lamb ............. ............. [YES]sparse_attn [YES] ...... ............ ...... [OKAY] [NO] [OKAY] .......sparse_attn [OKAY]............ [NO] .......transformer [OKAY]............ [YES] sparse_attn......transformer sparse_attn ............[OKAY]........................ [NO][NO][YES] ....................stochastic_transformer [OKAY][OKAY][OKAY] . [YES] ...... stochastic_transformertransformer[OKAY]transformer ......................... [YES] [YES][YES] .................. [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer. [YES]. ......[YES] [OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. ..................[OKAY] [OKAY] [OKAY] --------------------------------------------------[OKAY] ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- ................ op name op name installedop name ................ ................ .................. installed ..installedinstalledcompatible compatible--------------------------------------------------.. .. -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... ...............[OKAY] cpu_adamcpu_adam[YES] .................................... [YES][OKAY][YES] ............ [OKAY]fused_adam [OKAY]............. fused_adam [YES]............. ......[YES] [OKAY]......fused_adam fused_adam[OKAY] fused_lamb ....................................... [YES]fused_lamb[YES][YES] ......................... ...... [YES] [OKAY][OKAY] [OKAY] ...... [OKAY]fused_lamb fused_lamb............. .............[YES] [YES]...... sparse_attn......[OKAY] ............[OKAY] sparse_attn [NO] ................... [NO][OKAY] ....... [OKAY] transformer sparse_attntransformer............ sparse_attn[YES]............ ............ ............ ......[NO] [YES] .......[NO] [OKAY] ......[OKAY]....... [OKAY] stochastic_transformer transformer [OKAY]. stochastic_transformer............ [YES].[YES] transformer[YES] .................. ............[OKAY] [YES][OKAY][OKAY] ...... [OKAY] stochastic_transformer .stochastic_transformer [YES]. ...... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] --------------------------------------------------[OKAY][OKAY]-------------------------------------------------- op name--------------------------------------------------op name -------------------------------------------------- ................ ................ op name installedop nameinstalled .................... ................compatible installedcompatibleinstalled-------------------------------------------------- --------------------------------------------------.... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......[YES] cpu_adamcpu_adam......[OKAY] ...............[OKAY]............... [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[YES] [YES]...... ......[OKAY] [OKAY]fused_adam fused_adam fused_lamb ............. ............. .............fused_lamb [YES] [YES] [YES] ................... ...... ......[YES][OKAY][OKAY] [OKAY]...... [OKAY]fused_lamb fused_lamb .......................... [YES][YES] ............ [OKAY][OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformer sparse_attnsparse_attn............transformer ............[YES]........................ [NO] [YES][NO] ...... ....... ............. [OKAY][OKAY][OKAY][OKAY] transformertransformerstochastic_transformerstochastic_transformer .......................... [YES][YES] [YES] [YES].................. ......[OKAY][OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op name ................op name ................ ................installed ................ installedinstalledinstalled.. ....compatible .. compatible--------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] ...... [OKAY]cpu_adam cpu_adamcpu_adam ............... ..............................[YES] [YES]fused_adam...... [YES] ...................[OKAY] [OKAY]......[YES] [OKAY]...... [OKAY] fused_adam fused_adam.............fused_lamb ............. ............. fused_adam [YES][YES][YES] ............. ............[YES] ...... [OKAY]......[OKAY][OKAY] [OKAY] fused_lamb fused_lamb.............fused_lamb .............[YES] ............. sparse_attn[YES] ...... [YES].................. [OKAY] [NO]...... [OKAY] .......[OKAY] [OKAY] transformer ............ [YES]sparse_attn ...... ............sparse_attn[OKAY] ............sparse_attn[NO] stochastic_transformer[NO]............ ....... . [NO].......[YES][OKAY] [OKAY]...... ....... [OKAY] transformer [OKAY]transformer ........................ transformer[YES][YES] ........................ [OKAY][YES][OKAY] ...... [OKAY] stochastic_transformerstochastic_transformer ..stochastic_transformer [YES][YES]. ............[YES] [OKAY] [OKAY]...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name................ ................ ................ ................installed installed..installed installed.. compatible .. .. compatible--------------------------------------------------compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam...............cpu_adam ............... .............................. [YES] [YES]......[YES][YES] ...... ...... [OKAY] ......[OKAY][OKAY] [OKAY] fused_adamfused_adamfused_adamfused_adam .................................................... [YES][YES][YES][YES] ...... .................. [OKAY][OKAY][OKAY] [OKAY] fused_lamb fused_lambfused_lambfused_lamb............. ............. ............. [YES]............. [YES]......[YES] [YES] ...... [OKAY]...... ......[OKAY] [OKAY][OKAY] sparse_attn sparse_attn............sparse_attnsparse_attn [NO]............ ................... [NO]............ [NO] [OKAY] ....... [NO] .......[OKAY] transformer ....... [OKAY] ............ [OKAY]transformer [YES]transformer ..............................transformer [OKAY] [YES] [YES] ............ ...... ...... [YES]stochastic_transformer[OKAY] [OKAY] . ...... [YES]stochastic_transformer[OKAY] stochastic_transformer...... .[OKAY]. stochastic_transformer [YES][YES] ............. [OKAY][YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ...................................................... [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameninja op nameop name ................ .................................. ................ installedinstalled[OKAY] installed.. .. ..compatible--------------------------------------------------compatible compatible----------------------------------------------------------------------------------------------------op name --------------------------------------------------................ installed .. compatible cpu_adam--------------------------------------------------cpu_adam cpu_adam.............................. [YES]...............[YES] ............ [YES] [OKAY] cpu_adam [OKAY] ..................... fused_adam[OKAY][YES] ................... [YES][OKAY] ...... [OKAY] fused_adam fused_lamb............. fused_adam fused_adam............. [YES] [YES]................... ...................[OKAY][YES] [YES] [OKAY] ...... ...... [OKAY]fused_lamb ............. [YES] [OKAY]...... fused_lamb [OKAY]............. sparse_attn[YES] .................. [NO][OKAY] ....... [OKAY] transformer sparse_attnfused_lamb............ ............ [YES][NO] ............. [OKAY][OKAY]............. sparse_attn ............stochastic_transformer transformer [NO] . ............ ....... [YES] [YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] transformer .................. [OKAY]stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................ ................ op name................ installed installed ................installed .... installed .. compatiblecompatible .. compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ...............[YES] cpu_adam[YES][YES]...... ......[OKAY] .....................[OKAY] [OKAY][YES] ...... [OKAY] fused_adam ............. [YES] fused_adamfused_adam...... ..........................[OKAY] [YES]fused_adam[YES] ......fused_lamb................... [OKAY].............[OKAY][YES] [YES]...... fused_lamb......fused_lamb ............. [OKAY].............[OKAY] [YES][YES] ............ [OKAY][OKAY] fused_lamb ............. [YES] sparse_attn...... ............ sparse_attn[OKAY][NO] sparse_attn ............................... [NO][OKAY] [NO] ....... .......[OKAY] transformer [OKAY] ............transformer [YES]transformersparse_attn............ .................. [YES] ............ [OKAY][YES] ...... [NO] ...... [OKAY] .......[OKAY] stochastic_transformer [OKAY].stochastic_transformer stochastic_transformer [YES] ........transformer [YES] [OKAY][YES] ............ ...... ...... [OKAY] [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] async_io ............... [NO]quantizer ..................... [NO][NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO]async_io ....... ...............[NO] [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... [OKAY].. [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] ......quantizer [OKAY].............. [NO] ....... [OKAY] quantizer .............. --------------------------------------------------[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inference ..transformer_inference [NO].. .......[NO] .......[OKAY] [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY]utils .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY]quantizer .............. [NO] .......quantizer [OKAY].............. [NO] .......-------------------------------------------------- [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yumutils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... .......[OKAY] [OKAY] utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] torch version .................... 1.8.2 torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ...............async_io [NO] ...................... async_io[NO][NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] .........transformer_inference [NO][OKAY] ......... [NO][OKAY] .......utils [OKAY].................. [YES] utils...... ..................[OKAY] utils [YES] ........................ [YES]quantizer[OKAY] .................... [OKAY][NO] ....... quantizer[OKAY] ..............quantizer [NO].............. .......[NO]-------------------------------------------------- [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch versionDeepSpeed general environment info: .................... 1.8.2 torch cuda version ...............torch install path 11.1............... nvcc version ..................... 11.2 deepspeed install path ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']torch version ....................deepspeed info 1.8.2................... 0.5.5+58a8e13, 58a8e13, mastertorch cuda version deepspeed wheel compiled w................ ......11.1 torch 1.8, cuda 11.1nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... torch install path0.5.5+58a8e13, 58a8e13, master ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch install path torch version............... .................... 1.8.2 torch cuda version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']............... 11.1 torch versionnvcc version ......................................... 1.8.211.2 deepspeed install path torch cuda version........... ............... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']11.1 deepspeed infonvcc version ........................................ 0.5.5+58a8e13, 58a8e13, master11.2 deepspeed wheel compiled w.deepspeed install path ................. torch 1.8, cuda 11.1 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch version .................... 1.8.2 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path DeepSpeed general environment info:............... torch install path['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.2 torch cuda version['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ............... 11.1 torch versionnvcc version ......................................... 1.8.211.2 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] nvcc versiondeepspeed info ........................................ 11.20.5.5+58a8e13, 58a8e13, master deepspeed install pathdeepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ******** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1809761.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/tr6-1B3-prefix-lm-unbiased-loss-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > setting tensorboard ... > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-11-03 09:44:05,097] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.341 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 25.497 seconds time to initialize megatron (seconds): -16.999 [after megatron is initialized] datetime: 2021-11-03 09:44:31 building GPT model ... [2021-11-03 09:44:31,043] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-03 09:44:31,044] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-03 09:44:31,045] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.24 GB, percent = 22.0% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-11-03 09:44:31,567] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=11 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 105743360 [2021-11-03 09:44:31,937] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-03 09:44:31,938] [INFO] [utils.py:807:see_memory_usage] MA 0.21 GB Max_MA 0.21 GB CA 0.22 GB Max_CA 0 GB [2021-11-03 09:44:31,938] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.48 GB, percent = 22.2% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 105739264 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-03 09:44:31,957] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-03 09:44:32,028] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-03 09:44:32,028] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-03 09:44:32,029] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-03 09:44:32,030] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-03 09:44:32,030] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-03 09:44:32,030] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-03 09:44:32,030] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-03 09:44:32,030] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-03 09:44:32,030] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-03 09:44:32,030] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 25 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 30 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 18 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 31 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 33 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 17 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 41 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 26 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 42 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 37 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 35 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 29 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 16 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 28 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 38 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 45 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 19 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 32 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 40 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 36 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 27 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 34 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 21 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 20 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 22 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 23 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 46 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 24 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 44 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 39 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 43 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 47 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 11 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 49 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 54 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 3 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 48 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 7 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 59 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 62 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 10 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 15 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 53 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 52 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 2 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 57 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 55 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 6 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 12 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 60 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 56 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 50 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 51 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 0 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 61 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 63 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 14 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 9 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 1 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 5 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 58 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 8 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 13 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 4 partition count [4, 4] and sizes[(26411008, False), (23808, False)] [2021-11-03 09:44:32,335] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-03 09:44:32,336] [INFO] [utils.py:807:see_memory_usage] MA 0.3 GB Max_MA 0.35 GB CA 0.59 GB Max_CA 1 GB [2021-11-03 09:44:32,336] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.27 GB, percent = 22.6% [2021-11-03 09:44:32,363] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-03 09:44:32,364] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB Max_MA 0.59 GB CA 0.89 GB Max_CA 1 GB [2021-11-03 09:44:32,364] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.27 GB, percent = 22.6% [2021-11-03 09:44:32,364] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-03 09:44:32,396] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-03 09:44:32,397] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB Max_MA 0.49 GB CA 0.89 GB Max_CA 1 GB [2021-11-03 09:44:32,397] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.27 GB, percent = 22.6% [2021-11-03 09:44:32,397] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-03 09:44:32,398] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-03 09:44:32,398] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-03 09:44:32,398] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-03 09:44:32,398] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] amp_params ................... False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-03 09:44:32,398] [INFO] [config.py:944:print] dump_state ................... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] pld_params ................... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-03 09:44:32,399] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 8 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] world_size ................... 4 [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-03 09:44:32,400] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-03 09:44:32,400] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-03 09:44:32,400] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=48 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=51 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=50 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-03 09:44:32,690] [INFO] [engine.py:151:__init__] RANK=49 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 32 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 45 successfully loaded 4 ZeRO state_dicts for rank 35 successfully loaded 4 ZeRO state_dicts for rank 29 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 41 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 28 loading 4 zero partition checkpoints for rank 47 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 62 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 40 loading 4 zero partition checkpoints for rank 32 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 20 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 54 loading 4 zero partition checkpoints for rank 45 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 43 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 38 successfully loaded 4 ZeRO state_dicts for rank 46 successfully loaded 4 ZeRO state_dicts for rank 39 successfully loaded 4 ZeRO state_dicts for rank 13 successfully loaded 4 ZeRO state_dicts for rank 26 loading 4 zero partition checkpoints for rank 33 successfully loaded 4 ZeRO state_dicts for rank 30 loading 4 zero partition checkpoints for rank 21 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 44 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 31 loading 4 zero partition checkpoints for rank 23 successfully loaded 4 ZeRO state_dicts for rank 5 loading 4 zero partition checkpoints for rank 20 loading 4 zero partition checkpoints for rank 16 loading 4 zero partition checkpoints for rank 18 successfully loaded 4 ZeRO state_dicts for rank 25 successfully loaded 4 ZeRO state_dicts for rank 12 loading 4 zero partition checkpoints for rank 34 successfully loaded 4 ZeRO state_dicts for rank 60 loading 4 zero partition checkpoints for rank 41 loading 4 zero partition checkpoints for rank 42 loading 4 zero partition checkpoints for rank 40 loading 4 zero partition checkpoints for rank 43 successfully loaded 4 ZeRO state_dicts for rank 19 loading 4 zero partition checkpoints for rank 38loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 46 successfully loaded 4 ZeRO state_dicts for rank 17 loading 4 zero partition checkpoints for rank 39 loading 4 zero partition checkpoints for rank 24 successfully loaded 4 ZeRO state_dicts for rank 10 successfully loaded 4 ZeRO state_dicts for rank 48 successfully loaded 4 ZeRO state_dicts for rank 52 loading 4 zero partition checkpoints for rank 26 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 59 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 7 successfully loaded 4 ZeRO state_dicts for rank 27 loading 4 zero partition checkpoints for rank 30 successfully loaded 4 ZeRO state_dicts for rank 49 loading 4 zero partition checkpoints for rank 62 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 37 successfully loaded 4 ZeRO state_dicts for rank 51 loading 4 zero partition checkpoints for rank 9 loading 4 zero partition checkpoints for rank 54 successfully loaded 4 ZeRO state_dicts for rank 61 loading 4 zero partition checkpoints for rank 13 successfully loaded 4 ZeRO state_dicts for rank 3 successfully loaded 4 ZeRO state_dicts for rank 1 loading 4 zero partition checkpoints for rank 25 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 15 loading 4 zero partition checkpoints for rank 19 successfully loaded 4 ZeRO state_dicts for rank 11 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 55 loading 4 zero partition checkpoints for rank 5 successfully loaded 4 ZeRO state_dicts for rank 63 loading 4 zero partition checkpoints for rank 17 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 57 loading 4 zero partition checkpoints for rank 27 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 60 successfully loaded 4 ZeRO state_dicts for rank 50 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 10 loading 4 zero partition checkpoints for rank 52 successfully loaded 4 ZeRO state_dicts for rank 0 loading 4 zero partition checkpoints for rank 7 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 14 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 49 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 51 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 3loading 4 zero partition checkpoints for rank 1 loading 4 zero partition checkpoints for rank 11 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 15 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 57 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 4 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 2 successfully loaded 4 ZeRO state_dicts for rank 6 loading 4 zero partition checkpoints for rank 6 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints at iteration 19679 time (ms) | load-checkpoint: 3617.28 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376estimated model parameters: 1.69189376estimated model parameters: 1.69189376 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.2095488 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224estimated model parameters: 1.691828224 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.691828224 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-03 09:44:36 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.165370 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.222 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.228 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.067 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-03 09:44:42 done with setup ... training ... time (ms) | model-and-optimizer-setup: 5387.97 | train/valid/test-data-iterators-setup: 5800.94 Number of parameters: 1.209483264 billionNumber of parameters: 1.209483264 billion Number of parameters: 1.691828224 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.691828224 billion Number of parameters: 1.691828224 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.69189376 billion Number of parameters: 1.69189376 billion Number of parameters: 1.69189376 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters: 1.691828224 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.69189376 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-11-03 09:44:43 [2021-11-03 09:44:43,759] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-03 09:44:43,759] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-03 09:44:43,759] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-03 09:44:43,759] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-03 09:44:43,759] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 48] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4380.0517578125 | reserved: 7464.0 | max reserved: 7464.0 [Rank 51] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 6848.0 | max reserved: 6848.0 [Rank 19] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4742.0 | max reserved: 4742.0 [Rank 35] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4326.0 | max reserved: 4326.0 [Rank 3] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5728.0 | max reserved: 5728.0 [Rank 16] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4710.0 | max reserved: 4710.0 [Rank 32] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4278.0 | max reserved: 4278.0 [Rank 0] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5840.0 | max reserved: 5840.0 iteration 19800/ 152972 | consumed samples: 5057984 | consumed tokens: 10358751232 | elapsed time per iteration (ms): 6364.9 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 2.183335E+00 | loss scale: 524288.0 | grad norm: 37428.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 17] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4918.0 | max reserved: 4918.0 [Rank 1] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5712.0 | max reserved: 5712.0 [Rank 33] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4342.0 | max reserved: 4342.0 [Rank 49] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 7464.0 | max reserved: 7464.0 [Rank 34] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4326.0 | max reserved: 4326.0 [Rank 2] (after 19800 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5728.0 | max reserved: 5728.0 [Rank 50] (after 19800 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4380.0517578125 | reserved: 6848.0 | max reserved: 6848.0 [Rank 18] (after 19800 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4790.0 | max reserved: 4790.0 [2021-11-03 10:18:18,209] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=33, lr=[0.0001978401275310349, 0.0001978401275310349], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 20000 loss: 2.0978 iter time (s): 0.003 samples/sec: 163228.888 iteration 20000/ 152972 | consumed samples: 5160384 | consumed tokens: 10568466432 | elapsed time per iteration (ms): 6225.3 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 2.163271E+00 | loss scale: 1048576.0 | grad norm: 85054.656 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 20000 | lm loss value: 2.121953E+00 | lm loss PPL: 8.347424E+00 | ------------------------------------------------------------------------------------------------- iteration 20200/ 152972 | consumed samples: 5262784 | consumed tokens: 10778181632 | elapsed time per iteration (ms): 7254.7 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 2.156370E+00 | loss scale: 1048576.0 | grad norm: 78594.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 20400/ 152972 | consumed samples: 5365184 | consumed tokens: 10987896832 | elapsed time per iteration (ms): 6174.3 | learning rate: 1.977E-04 | global batch size: 512 | lm loss: 2.153392E+00 | loss scale: 1048576.0 | grad norm: 86630.850 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 20600/ 152972 | consumed samples: 5467584 | consumed tokens: 11197612032 | elapsed time per iteration (ms): 6141.6 | learning rate: 1.976E-04 | global batch size: 512 | lm loss: 2.144076E+00 | loss scale: 1048576.0 | grad norm: 90678.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 20800/ 152972 | consumed samples: 5569984 | consumed tokens: 11407327232 | elapsed time per iteration (ms): 6134.5 | learning rate: 1.975E-04 | global batch size: 512 | lm loss: 2.166379E+00 | loss scale: 1048576.0 | grad norm: 94760.884 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21000/ 152972 | consumed samples: 5672384 | consumed tokens: 11617042432 | elapsed time per iteration (ms): 6190.7 | learning rate: 1.974E-04 | global batch size: 512 | lm loss: 2.168400E+00 | loss scale: 524288.0 | grad norm: 46761.028 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 21000 | lm loss value: 2.114544E+00 | lm loss PPL: 8.285810E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 21000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-03 12:08:02,360] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/mp_rank_01_model_states.pt [2021-11-03 12:08:02,369] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/mp_rank_00_model_states.pt [2021-11-03 12:08:02,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-03 12:08:02,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-03 12:08:02,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-03 12:08:02,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-03 12:08:02,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-03 12:08:02,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-03 12:08:02,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-03 12:08:02,762] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-03 12:08:02,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-03 12:08:02,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-03 12:08:02,764] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-03 12:08:02,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-03 12:08:02,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-03 12:08:02,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-03 12:08:02,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-03 12:08:02,774] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-03 12:08:02,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-03 12:08:02,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-03 12:08:02,778] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-03 12:08:02,779] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-03 12:08:02,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-03 12:08:02,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-03 12:08:02,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-03 12:08:02,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-03 12:08:02,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-03 12:08:02,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-03 12:08:02,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-03 12:08:02,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-03 12:08:02,790] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-03 12:08:02,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-03 12:08:02,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-03 12:08:02,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-03 12:08:02,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-03 12:08:02,894] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-03 12:08:02,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-03 12:08:02,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-03 12:08:02,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-03 12:08:02,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-03 12:08:02,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-03 12:08:02,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-03 12:08:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-03 12:08:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-03 12:08:02,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-03 12:08:02,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-03 12:08:02,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-03 12:08:02,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-03 12:08:02,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-03 12:08:02,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-03 12:08:02,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-03 12:08:02,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-03 12:08:02,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-03 12:08:02,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-03 12:08:02,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-03 12:08:02,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-03 12:08:02,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-03 12:08:02,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-03 12:08:02,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-03 12:08:02,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-03 12:08:02,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-03 12:08:02,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-03 12:08:03,047] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-03 12:08:03,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-03 12:08:03,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-03 12:08:03,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_14_optim_states.pt successfully saved checkpoint at iteration 21000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1314.54 iteration 21200/ 152972 | consumed samples: 5774784 | consumed tokens: 11826757632 | elapsed time per iteration (ms): 7257.6 | learning rate: 1.973E-04 | global batch size: 512 | lm loss: 2.157695E+00 | loss scale: 524288.0 | grad norm: 46992.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21400/ 152972 | consumed samples: 5877184 | consumed tokens: 12036472832 | elapsed time per iteration (ms): 6256.0 | learning rate: 1.972E-04 | global batch size: 512 | lm loss: 2.165456E+00 | loss scale: 1048576.0 | grad norm: 106117.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21600/ 152972 | consumed samples: 5979584 | consumed tokens: 12246188032 | elapsed time per iteration (ms): 6213.1 | learning rate: 1.971E-04 | global batch size: 512 | lm loss: 2.222055E+00 | loss scale: 131072.0 | grad norm: 147716.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21800/ 152972 | consumed samples: 6081984 | consumed tokens: 12455903232 | elapsed time per iteration (ms): 6193.3 | learning rate: 1.970E-04 | global batch size: 512 | lm loss: 2.295924E+00 | loss scale: 131072.0 | grad norm: 12071.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-03 13:51:33,097] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=41, lr=[0.0001968677694572278, 0.0001968677694572278], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 22000 loss: 1.9122 iter time (s): 0.003 samples/sec: 167117.932 iteration 22000/ 152972 | consumed samples: 6184384 | consumed tokens: 12665618432 | elapsed time per iteration (ms): 6158.4 | learning rate: 1.969E-04 | global batch size: 512 | lm loss: 2.179331E+00 | loss scale: 131072.0 | grad norm: 11304.876 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 22000 | lm loss value: 2.123673E+00 | lm loss PPL: 8.361794E+00 | ------------------------------------------------------------------------------------------------- iteration 22200/ 152972 | consumed samples: 6286784 | consumed tokens: 12875333632 | elapsed time per iteration (ms): 7103.5 | learning rate: 1.968E-04 | global batch size: 512 | lm loss: 2.141497E+00 | loss scale: 262144.0 | grad norm: 21875.317 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 22400/ 152972 | consumed samples: 6389184 | consumed tokens: 13085048832 | elapsed time per iteration (ms): 6142.3 | learning rate: 1.967E-04 | global batch size: 512 | lm loss: 2.156774E+00 | loss scale: 262144.0 | grad norm: 23609.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 22500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-03 14:46:02,079] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/mp_rank_00_model_states.pt [2021-11-03 14:46:02,080] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/mp_rank_01_model_states.pt [2021-11-03 14:46:02,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-03 14:46:02,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-03 14:46:02,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-03 14:46:02,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-03 14:46:02,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-03 14:46:02,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-03 14:46:02,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-03 14:46:02,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-03 14:46:02,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-03 14:46:02,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-03 14:46:02,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-03 14:46:02,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-03 14:46:02,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-03 14:46:02,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-03 14:46:02,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-03 14:46:02,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-03 14:46:02,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-03 14:46:02,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-03 14:46:02,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-03 14:46:02,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-03 14:46:02,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-03 14:46:02,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-03 14:46:02,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-03 14:46:02,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-03 14:46:02,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-03 14:46:02,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-03 14:46:02,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-03 14:46:02,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-03 14:46:02,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-03 14:46:02,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-03 14:46:02,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-03 14:46:02,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-03 14:46:02,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-03 14:46:02,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-03 14:46:02,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-03 14:46:02,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-03 14:46:02,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-03 14:46:02,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-03 14:46:02,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-03 14:46:02,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-03 14:46:02,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-03 14:46:02,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-03 14:46:02,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-03 14:46:02,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-03 14:46:02,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-03 14:46:02,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-03 14:46:02,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-03 14:46:02,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-03 14:46:02,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-03 14:46:02,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-03 14:46:02,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-03 14:46:02,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-03 14:46:02,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-03 14:46:02,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-03 14:46:02,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-03 14:46:02,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-03 14:46:02,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-03 14:46:02,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-03 14:46:02,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-03 14:46:02,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-03 14:46:02,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-03 14:46:02,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-03 14:46:02,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-03 14:46:02,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 22500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1161.09 iteration 22600/ 152972 | consumed samples: 6491584 | consumed tokens: 13294764032 | elapsed time per iteration (ms): 6219.2 | learning rate: 1.965E-04 | global batch size: 512 | lm loss: 2.154578E+00 | loss scale: 524288.0 | grad norm: 46552.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 22800/ 152972 | consumed samples: 6593984 | consumed tokens: 13504479232 | elapsed time per iteration (ms): 6227.8 | learning rate: 1.964E-04 | global batch size: 512 | lm loss: 2.146162E+00 | loss scale: 524288.0 | grad norm: 166710.433 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23000/ 152972 | consumed samples: 6696384 | consumed tokens: 13714194432 | elapsed time per iteration (ms): 6234.8 | learning rate: 1.963E-04 | global batch size: 512 | lm loss: 2.142305E+00 | loss scale: 524288.0 | grad norm: 46735.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 23000 | lm loss value: 2.145347E+00 | lm loss PPL: 8.545003E+00 | ------------------------------------------------------------------------------------------------- iteration 23200/ 152972 | consumed samples: 6798784 | consumed tokens: 13923909632 | elapsed time per iteration (ms): 7266.2 | learning rate: 1.962E-04 | global batch size: 512 | lm loss: 2.136395E+00 | loss scale: 1048576.0 | grad norm: 81093.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23400/ 152972 | consumed samples: 6901184 | consumed tokens: 14133624832 | elapsed time per iteration (ms): 6192.7 | learning rate: 1.961E-04 | global batch size: 512 | lm loss: 2.145520E+00 | loss scale: 1048576.0 | grad norm: 104944.193 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23600/ 152972 | consumed samples: 7003584 | consumed tokens: 14343340032 | elapsed time per iteration (ms): 6207.8 | learning rate: 1.960E-04 | global batch size: 512 | lm loss: 2.137600E+00 | loss scale: 524288.0 | grad norm: 47023.679 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23800/ 152972 | consumed samples: 7105984 | consumed tokens: 14553055232 | elapsed time per iteration (ms): 6191.3 | learning rate: 1.958E-04 | global batch size: 512 | lm loss: 2.120955E+00 | loss scale: 524288.0 | grad norm: 43944.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-03 17:24:45,898] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=43, lr=[0.00019571501545678581, 0.00019571501545678581], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 24000 loss: 2.4001 iter time (s): 0.003 samples/sec: 166338.839 iteration 24000/ 152972 | consumed samples: 7208384 | consumed tokens: 14762770432 | elapsed time per iteration (ms): 6178.3 | learning rate: 1.957E-04 | global batch size: 512 | lm loss: 2.145050E+00 | loss scale: 1048576.0 | grad norm: 99932.195 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 24000 | lm loss value: 2.104485E+00 | lm loss PPL: 8.202874E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 24000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-03 17:28:23,622] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/mp_rank_01_model_states.pt [2021-11-03 17:28:23,706] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/mp_rank_00_model_states.pt [2021-11-03 17:28:24,065] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-03 17:28:24,068] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-03 17:28:24,069] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-03 17:28:24,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-03 17:28:24,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-03 17:28:24,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-03 17:28:24,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-03 17:28:24,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-03 17:28:24,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-03 17:28:24,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-03 17:28:24,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-03 17:28:24,091] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-03 17:28:24,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-03 17:28:24,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-03 17:28:24,094] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-03 17:28:24,097] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-03 17:28:24,099] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-03 17:28:24,102] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-03 17:28:24,104] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-03 17:28:24,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-03 17:28:24,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-03 17:28:24,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-03 17:28:24,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-03 17:28:24,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-03 17:28:24,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-03 17:28:24,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-03 17:28:24,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-03 17:28:24,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-03 17:28:24,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-03 17:28:24,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-03 17:28:24,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-03 17:28:24,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-03 17:28:24,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-03 17:28:24,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-03 17:28:24,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-03 17:28:24,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-03 17:28:24,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-03 17:28:24,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-03 17:28:24,226] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-03 17:28:24,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-03 17:28:24,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-03 17:28:24,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-03 17:28:24,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-03 17:28:24,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-03 17:28:24,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-03 17:28:24,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-03 17:28:24,249] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-03 17:28:24,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-03 17:28:24,253] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-03 17:28:24,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-03 17:28:24,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-03 17:28:24,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-03 17:28:24,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-03 17:28:24,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-03 17:28:24,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-03 17:28:24,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-03 17:28:24,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-03 17:28:24,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-03 17:28:24,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-03 17:28:24,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-03 17:28:24,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-03 17:28:24,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-03 17:28:24,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-03 17:28:24,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_09_optim_states.pt successfully saved checkpoint at iteration 24000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1889.29 iteration 24200/ 152972 | consumed samples: 7310784 | consumed tokens: 14972485632 | elapsed time per iteration (ms): 7303.5 | learning rate: 1.956E-04 | global batch size: 512 | lm loss: 2.131296E+00 | loss scale: 1048576.0 | grad norm: 95563.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 24400/ 152972 | consumed samples: 7413184 | consumed tokens: 15182200832 | elapsed time per iteration (ms): 6214.0 | learning rate: 1.955E-04 | global batch size: 512 | lm loss: 2.146177E+00 | loss scale: 1048576.0 | grad norm: 87485.484 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 24600/ 152972 | consumed samples: 7515584 | consumed tokens: 15391916032 | elapsed time per iteration (ms): 6280.8 | learning rate: 1.953E-04 | global batch size: 512 | lm loss: 2.128909E+00 | loss scale: 1048576.0 | grad norm: 97207.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 24800/ 152972 | consumed samples: 7617984 | consumed tokens: 15601631232 | elapsed time per iteration (ms): 6187.3 | learning rate: 1.952E-04 | global batch size: 512 | lm loss: 2.133203E+00 | loss scale: 524288.0 | grad norm: 45912.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 25000/ 152972 | consumed samples: 7720384 | consumed tokens: 15811346432 | elapsed time per iteration (ms): 6228.0 | learning rate: 1.951E-04 | global batch size: 512 | lm loss: 2.125260E+00 | loss scale: 524288.0 | grad norm: 43973.275 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 25000 | lm loss value: 2.112226E+00 | lm loss PPL: 8.266620E+00 | ------------------------------------------------------------------------------------------------- iteration 25200/ 152972 | consumed samples: 7822784 | consumed tokens: 16021061632 | elapsed time per iteration (ms): 7240.5 | learning rate: 1.949E-04 | global batch size: 512 | lm loss: 2.138931E+00 | loss scale: 1048576.0 | grad norm: 84720.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 25400/ 152972 | consumed samples: 7925184 | consumed tokens: 16230776832 | elapsed time per iteration (ms): 6222.4 | learning rate: 1.948E-04 | global batch size: 512 | lm loss: 2.116920E+00 | loss scale: 1048576.0 | grad norm: 100649.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 25500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-03 20:07:25,627] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/mp_rank_01_model_states.pt [2021-11-03 20:07:25,703] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/mp_rank_00_model_states.pt [2021-11-03 20:07:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-03 20:07:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-03 20:07:26,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-03 20:07:26,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-03 20:07:26,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-03 20:07:26,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-03 20:07:26,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-03 20:07:26,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-03 20:07:26,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-03 20:07:26,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-03 20:07:26,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-03 20:07:26,751] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-03 20:07:26,751] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-03 20:07:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-03 20:07:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-03 20:07:26,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-03 20:07:26,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-03 20:07:26,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-03 20:07:26,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-03 20:07:26,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-03 20:07:26,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-03 20:07:26,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-03 20:07:26,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-03 20:07:26,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-03 20:07:26,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-03 20:07:26,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-03 20:07:26,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-03 20:07:26,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-03 20:07:26,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-03 20:07:26,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-03 20:07:26,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-03 20:07:26,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-03 20:07:26,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-03 20:07:26,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-03 20:07:26,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-03 20:07:26,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-03 20:07:26,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-03 20:07:26,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-03 20:07:26,920] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-03 20:07:26,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-03 20:07:26,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-03 20:07:26,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-03 20:07:26,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-03 20:07:26,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-03 20:07:27,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-03 20:07:27,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-03 20:07:27,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-03 20:07:27,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-03 20:07:27,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-03 20:07:27,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-03 20:07:27,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-03 20:07:27,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-03 20:07:27,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-03 20:07:27,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-03 20:07:27,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-03 20:07:27,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-03 20:07:27,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-03 20:07:27,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-03 20:07:27,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-03 20:07:27,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-03 20:07:27,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_12_optim_states.pt successfully saved checkpoint at iteration 25500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 2315.49 iteration 25600/ 152972 | consumed samples: 8027584 | consumed tokens: 16440492032 | elapsed time per iteration (ms): 6242.7 | learning rate: 1.947E-04 | global batch size: 512 | lm loss: 2.127425E+00 | loss scale: 524288.0 | grad norm: 42897.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 25800/ 152972 | consumed samples: 8129984 | consumed tokens: 16650207232 | elapsed time per iteration (ms): 6239.0 | learning rate: 1.945E-04 | global batch size: 512 | lm loss: 2.107143E+00 | loss scale: 524288.0 | grad norm: 45165.579 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-03 20:59:23,949] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=48, lr=[0.00019438888040786292, 0.00019438888040786292], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 26000/ 152972 | consumed samples: 8232384 | consumed tokens: 16859922432 | elapsed time per iteration (ms): 6232.0 | learning rate: 1.944E-04 | global batch size: 512 | lm loss: 2.119192E+00 | loss scale: 1048576.0 | grad norm: 88224.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 26000 loss: 2.1227 iter time (s): 0.003 samples/sec: 163161.850 ------------------------------------------------------------------------------------------------- validation loss at iteration 26000 | lm loss value: 2.104676E+00 | lm loss PPL: 8.204442E+00 | ------------------------------------------------------------------------------------------------- iteration 26200/ 152972 | consumed samples: 8334784 | consumed tokens: 17069637632 | elapsed time per iteration (ms): 7261.1 | learning rate: 1.942E-04 | global batch size: 512 | lm loss: 2.120884E+00 | loss scale: 1048576.0 | grad norm: 86149.086 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 26400/ 152972 | consumed samples: 8437184 | consumed tokens: 17279352832 | elapsed time per iteration (ms): 6236.4 | learning rate: 1.941E-04 | global batch size: 512 | lm loss: 2.129324E+00 | loss scale: 1048576.0 | grad norm: 84281.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 26600/ 152972 | consumed samples: 8539584 | consumed tokens: 17489068032 | elapsed time per iteration (ms): 6247.2 | learning rate: 1.940E-04 | global batch size: 512 | lm loss: 2.105178E+00 | loss scale: 1048576.0 | grad norm: 86034.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 26800/ 152972 | consumed samples: 8641984 | consumed tokens: 17698783232 | elapsed time per iteration (ms): 6227.6 | learning rate: 1.938E-04 | global batch size: 512 | lm loss: 2.122990E+00 | loss scale: 1048576.0 | grad norm: 92453.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27000/ 152972 | consumed samples: 8744384 | consumed tokens: 17908498432 | elapsed time per iteration (ms): 6241.9 | learning rate: 1.937E-04 | global batch size: 512 | lm loss: 2.119131E+00 | loss scale: 1048576.0 | grad norm: 81408.847 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 27000 | lm loss value: 2.086685E+00 | lm loss PPL: 8.058160E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 27000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-03 22:50:10,103] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/mp_rank_01_model_states.pt [2021-11-03 22:50:10,125] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/mp_rank_00_model_states.pt [2021-11-03 22:50:11,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-03 22:50:11,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-03 22:50:11,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-03 22:50:11,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-03 22:50:11,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-03 22:50:11,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-03 22:50:11,090] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-03 22:50:11,091] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-03 22:50:11,104] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-03 22:50:11,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-03 22:50:11,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-03 22:50:11,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-03 22:50:11,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-03 22:50:11,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-03 22:50:11,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-03 22:50:11,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-03 22:50:11,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-03 22:50:11,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-03 22:50:11,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-03 22:50:11,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-03 22:50:11,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-03 22:50:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-03 22:50:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-03 22:50:11,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-03 22:50:11,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-03 22:50:11,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-03 22:50:11,221] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-03 22:50:11,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-03 22:50:11,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-03 22:50:11,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-03 22:50:11,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-03 22:50:11,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-03 22:50:11,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-03 22:50:11,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-03 22:50:11,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-03 22:50:11,246] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-03 22:50:11,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-03 22:50:11,256] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-03 22:50:11,257] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-03 22:50:11,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-03 22:50:11,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-03 22:50:11,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-03 22:50:11,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-03 22:50:11,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-03 22:50:11,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-03 22:50:11,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-03 22:50:11,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-03 22:50:11,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-03 22:50:11,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-03 22:50:11,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-03 22:50:11,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-03 22:50:11,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-03 22:50:11,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-03 22:50:11,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-03 22:50:11,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-03 22:50:11,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-03 22:50:11,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-03 22:50:11,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-03 22:50:11,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-03 22:50:11,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-03 22:50:12,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_03_optim_states.pt successfully saved checkpoint at iteration 27000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 2774.82 iteration 27200/ 152972 | consumed samples: 8846784 | consumed tokens: 18118213632 | elapsed time per iteration (ms): 7250.6 | learning rate: 1.935E-04 | global batch size: 512 | lm loss: 2.120025E+00 | loss scale: 2097152.0 | grad norm: 171192.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27400/ 152972 | consumed samples: 8949184 | consumed tokens: 18327928832 | elapsed time per iteration (ms): 6274.7 | learning rate: 1.934E-04 | global batch size: 512 | lm loss: 2.116377E+00 | loss scale: 1048576.0 | grad norm: 87551.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27600/ 152972 | consumed samples: 9051584 | consumed tokens: 18537644032 | elapsed time per iteration (ms): 6218.6 | learning rate: 1.932E-04 | global batch size: 512 | lm loss: 2.111652E+00 | loss scale: 524288.0 | grad norm: 46189.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27800/ 152972 | consumed samples: 9153984 | consumed tokens: 18747359232 | elapsed time per iteration (ms): 6199.4 | learning rate: 1.931E-04 | global batch size: 512 | lm loss: 2.809586E+00 | loss scale: 32768.0 | grad norm: 3530.619 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 00:33:56,259] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=58, lr=[0.00019289429310383492, 0.00019289429310383492], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 28000 loss: 2.4090 iter time (s): 0.003 samples/sec: 164994.186 iteration 28000/ 152972 | consumed samples: 9256384 | consumed tokens: 18957074432 | elapsed time per iteration (ms): 6204.0 | learning rate: 1.929E-04 | global batch size: 512 | lm loss: 2.129491E+00 | loss scale: 32768.0 | grad norm: 3274.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 28000 | lm loss value: 2.104125E+00 | lm loss PPL: 8.199929E+00 | ------------------------------------------------------------------------------------------------- iteration 28200/ 152972 | consumed samples: 9358784 | consumed tokens: 19166789632 | elapsed time per iteration (ms): 7238.1 | learning rate: 1.927E-04 | global batch size: 512 | lm loss: 2.118730E+00 | loss scale: 65536.0 | grad norm: 5685.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 28400/ 152972 | consumed samples: 9461184 | consumed tokens: 19376504832 | elapsed time per iteration (ms): 6216.6 | learning rate: 1.926E-04 | global batch size: 512 | lm loss: 2.121063E+00 | loss scale: 65536.0 | grad norm: 5641.948 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 28500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 01:29:10,847] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/mp_rank_00_model_states.pt [2021-11-04 01:29:10,854] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/mp_rank_01_model_states.pt [2021-11-04 01:29:11,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 01:29:11,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 01:29:11,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 01:29:11,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 01:29:11,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 01:29:11,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 01:29:11,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 01:29:11,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 01:29:11,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 01:29:11,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 01:29:11,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 01:29:11,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 01:29:11,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 01:29:11,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 01:29:11,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 01:29:11,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 01:29:11,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 01:29:11,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 01:29:11,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 01:29:11,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 01:29:11,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 01:29:11,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 01:29:11,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 01:29:11,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 01:29:11,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 01:29:11,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 01:29:11,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 01:29:11,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 01:29:11,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 01:29:11,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 01:29:11,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 01:29:11,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 01:29:11,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 01:29:11,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 01:29:11,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 01:29:11,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 01:29:11,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 01:29:11,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 01:29:11,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 01:29:11,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 01:29:11,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 01:29:11,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 01:29:11,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 01:29:11,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 01:29:11,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 01:29:11,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 01:29:11,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 01:29:11,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 01:29:11,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 01:29:11,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 01:29:11,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 01:29:11,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 01:29:11,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 01:29:11,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 01:29:11,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 01:29:11,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 01:29:11,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 01:29:11,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 01:29:11,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 01:29:11,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-04 01:29:11,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 01:29:11,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 01:29:11,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 01:29:12,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_13_optim_states.pt successfully saved checkpoint at iteration 28500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1783.78 iteration 28600/ 152972 | consumed samples: 9563584 | consumed tokens: 19586220032 | elapsed time per iteration (ms): 6237.5 | learning rate: 1.924E-04 | global batch size: 512 | lm loss: 2.118943E+00 | loss scale: 65536.0 | grad norm: 5540.820 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 28800/ 152972 | consumed samples: 9665984 | consumed tokens: 19795935232 | elapsed time per iteration (ms): 6222.3 | learning rate: 1.922E-04 | global batch size: 512 | lm loss: 2.107822E+00 | loss scale: 131072.0 | grad norm: 10244.201 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29000/ 152972 | consumed samples: 9768384 | consumed tokens: 20005650432 | elapsed time per iteration (ms): 6242.1 | learning rate: 1.921E-04 | global batch size: 512 | lm loss: 2.096643E+00 | loss scale: 131072.0 | grad norm: 10914.822 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 29000 | lm loss value: 2.074451E+00 | lm loss PPL: 7.960179E+00 | ------------------------------------------------------------------------------------------------- iteration 29200/ 152972 | consumed samples: 9870784 | consumed tokens: 20215365632 | elapsed time per iteration (ms): 7206.9 | learning rate: 1.919E-04 | global batch size: 512 | lm loss: 2.111728E+00 | loss scale: 262144.0 | grad norm: 23340.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29400/ 152972 | consumed samples: 9973184 | consumed tokens: 20425080832 | elapsed time per iteration (ms): 6187.2 | learning rate: 1.917E-04 | global batch size: 512 | lm loss: 2.093884E+00 | loss scale: 262144.0 | grad norm: 21416.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29600/ 152972 | consumed samples: 10075584 | consumed tokens: 20634796032 | elapsed time per iteration (ms): 6203.2 | learning rate: 1.916E-04 | global batch size: 512 | lm loss: 2.107256E+00 | loss scale: 262144.0 | grad norm: 21814.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29800/ 152972 | consumed samples: 10177984 | consumed tokens: 20844511232 | elapsed time per iteration (ms): 6205.0 | learning rate: 1.914E-04 | global batch size: 512 | lm loss: 2.096914E+00 | loss scale: 524288.0 | grad norm: 42469.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 04:07:49,795] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=58, lr=[0.0001912222371885727, 0.0001912222371885727], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 30000 loss: 2.1393 iter time (s): 0.003 samples/sec: 165514.637 iteration 30000/ 152972 | consumed samples: 10280384 | consumed tokens: 21054226432 | elapsed time per iteration (ms): 6208.9 | learning rate: 1.912E-04 | global batch size: 512 | lm loss: 2.087625E+00 | loss scale: 524288.0 | grad norm: 43867.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 30000 | lm loss value: 2.083765E+00 | lm loss PPL: 8.034659E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 30000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 04:11:12,066] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/mp_rank_00_model_states.pt [2021-11-04 04:11:12,085] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/mp_rank_01_model_states.pt [2021-11-04 04:11:12,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 04:11:12,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 04:11:12,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 04:11:12,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 04:11:12,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 04:11:12,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 04:11:12,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 04:11:12,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 04:11:12,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 04:11:12,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 04:11:12,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 04:11:12,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 04:11:12,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 04:11:12,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 04:11:12,509] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 04:11:12,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 04:11:12,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 04:11:12,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 04:11:12,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 04:11:12,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 04:11:12,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 04:11:12,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 04:11:12,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 04:11:12,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 04:11:12,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 04:11:12,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 04:11:12,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 04:11:12,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 04:11:12,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 04:11:12,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 04:11:12,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 04:11:12,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 04:11:12,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 04:11:12,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 04:11:12,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 04:11:12,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 04:11:12,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 04:11:12,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 04:11:12,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 04:11:12,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 04:11:12,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 04:11:12,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 04:11:12,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 04:11:12,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 04:11:12,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 04:11:12,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 04:11:12,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 04:11:12,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 04:11:12,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 04:11:12,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 04:11:12,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 04:11:12,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-04 04:11:12,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 04:11:12,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 04:11:12,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 04:11:12,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 04:11:12,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 04:11:12,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 04:11:12,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 04:11:12,765] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 04:11:12,772] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 04:11:12,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 04:11:12,776] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 04:11:12,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_12_optim_states.pt successfully saved checkpoint at iteration 30000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1206.69 iteration 30200/ 152972 | consumed samples: 10382784 | consumed tokens: 21263941632 | elapsed time per iteration (ms): 7244.6 | learning rate: 1.910E-04 | global batch size: 512 | lm loss: 2.081899E+00 | loss scale: 1048576.0 | grad norm: 85511.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 30400/ 152972 | consumed samples: 10485184 | consumed tokens: 21473656832 | elapsed time per iteration (ms): 6192.5 | learning rate: 1.909E-04 | global batch size: 512 | lm loss: 2.101988E+00 | loss scale: 1048576.0 | grad norm: 83616.188 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 30600/ 152972 | consumed samples: 10587584 | consumed tokens: 21683372032 | elapsed time per iteration (ms): 6217.4 | learning rate: 1.907E-04 | global batch size: 512 | lm loss: 2.091301E+00 | loss scale: 524288.0 | grad norm: 45865.924 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 30800/ 152972 | consumed samples: 10689984 | consumed tokens: 21893087232 | elapsed time per iteration (ms): 6219.9 | learning rate: 1.905E-04 | global batch size: 512 | lm loss: 2.086251E+00 | loss scale: 524288.0 | grad norm: 47683.541 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 30807 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 05:34:48,885] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/mp_rank_01_model_states.pt [2021-11-04 05:34:48,886] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/mp_rank_00_model_states.pt [2021-11-04 05:34:49,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 05:34:49,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 05:34:49,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 05:34:49,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 05:34:49,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 05:34:49,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 05:34:49,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 05:34:49,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 05:34:49,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 05:34:49,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 05:34:49,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 05:34:49,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 05:34:49,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 05:34:49,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 05:34:49,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 05:34:49,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 05:34:49,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 05:34:49,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 05:34:49,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 05:34:49,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 05:34:49,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 05:34:49,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 05:34:49,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 05:34:49,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 05:34:49,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 05:34:49,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 05:34:49,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 05:34:49,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 05:34:49,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 05:34:49,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 05:34:49,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 05:34:49,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 05:34:49,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 05:34:49,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 05:34:49,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 05:34:49,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 05:34:49,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 05:34:49,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 05:34:49,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 05:34:49,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 05:34:49,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 05:34:49,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 05:34:49,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 05:34:49,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 05:34:49,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 05:34:49,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 05:34:49,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 05:34:49,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 05:34:49,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 05:34:49,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-04 05:34:49,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 05:34:49,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 05:34:49,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 05:34:49,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 05:34:49,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 05:34:49,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 05:34:49,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 05:34:49,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 05:34:49,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 05:34:49,539] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 05:34:49,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 05:34:49,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 05:34:49,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 05:34:49,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step30807/zero_pp_rank_2_mp_rank_03_optim_states.pt successfully saved checkpoint at iteration 30807 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1320.06 [exiting program after 1190.0076513091724 minutes] datetime: 2021-11-04 05:34:49 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................................................ ................ installedinstalled installed installed...... ..compatiblecompatiblecompatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam [YES] ............... ............... .....................[YES][YES] [YES] [OKAY]...... ...... ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [YES]fused_adam fused_adamfused_adam ...... ............. ............. [OKAY]............. [YES] [YES] [YES] ...... ......fused_lamb...... [OKAY] [OKAY]............. [OKAY] [YES] fused_lamb...... fused_lamb[OKAY]............. fused_lamb [YES].......................... ......[YES][YES] [OKAY]............ [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer sparse_attn [NO]............ sparse_attn ................... [YES]............[NO] [NO] ...... ....... [OKAY].......[OKAY] [OKAY][OKAY]transformer stochastic_transformer............transformertransformer [YES] . ........................ ...... [YES] [YES][YES][OKAY] .................. [OKAY][OKAY][OKAY]stochastic_transformer . [YES]stochastic_transformer ...... stochastic_transformer . [OKAY] . [YES] [YES]...... ......[OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ............... [NO] ....... [NO] async_io ............... [NO] transformer_inference....... ..[NO] [NO] ....... [OKAY] utils .................. [YES] ...... transformer_inference[OKAY] .. [NO] ....... [OKAY]quantizer .............. [NO] .......utils [OKAY].................. [YES] ......-------------------------------------------------- [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name op name................................................ installed................installed installed installed .. ....compatible.. compatiblecompatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam............... [YES]............... ...............[YES]...... ......[YES] [YES][OKAY][OKAY] ............ [OKAY][OKAY] fused_adam ............. fused_adam[YES] .............fused_adam...... fused_adam[YES] [OKAY] ............. ...... ............. [YES] [OKAY] [YES] fused_lamb...... ...................[OKAY] [OKAY]fused_lamb[YES] ...................fused_lamb fused_lamb[YES] [OKAY] ............. ................... [OKAY][YES][YES] ............ [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY]sparse_attn ............transformer sparse_attnsparse_attn [NO] ............ ................... ............[YES][NO] ...... [OKAY] [NO][OKAY] .............. [OKAY][OKAY] transformerstochastic_transformer transformertransformer. ............ ............ [YES] [YES]............[YES] [YES]...... ...... ............[OKAY] [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [YES][YES][YES] .................. [OKAY][OKAY] [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op nameop name op name................................................ installed installed................installed .... installed.. compatible compatiblecompatible..-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ............... [YES]cpu_adam ............... ...... [YES]............... [YES] [OKAY][YES]............ ......[OKAY][OKAY] [OKAY] fused_adam ............. fused_adam[YES] fused_adam.............fused_adam ...... ............. [YES][OKAY] ............. [YES] ......[YES] fused_lamb......[OKAY]...... [OKAY] ............. [OKAY] [YES]fused_lamb fused_lamb................... fused_lamb [OKAY].......................... [YES] [YES] ......[YES] ...... [OKAY] ...... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attnsparse_attn transformer sparse_attn............ ............ ............[NO]............ [YES][NO]....... [NO] .............[OKAY] ....... [OKAY] [OKAY] [OKAY] transformer transformer............transformerstochastic_transformer ............[YES] ............ . [YES]...... [YES] [YES]......[OKAY] ............[OKAY] [OKAY]stochastic_transformer[OKAY] stochastic_transformer . [YES]. stochastic_transformer ...... [YES] . [OKAY] ...... [YES][OKAY] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] quantizerasync_io ............................. [NO] [NO]....... .......[OKAY] [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja .................................... ninja..................[OKAY] ..................[OKAY][OKAY] -------------------------------------------------- [OKAY] -------------------------------------------------- --------------------------------------------------op name -------------------------------------------------- ................op nameop nameop name ................ installed................ ..................installedinstalled .. ..compatible installed compatible --------------------------------------------------..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES]cpu_adam [YES] ..................... [OKAY].....................[YES] [YES][OKAY]...... ......[OKAY] [OKAY]fused_adam ............. [YES]fused_adam ................... fused_adam[YES][OKAY] fused_adam ...... .............[OKAY] .............fused_lamb [YES] [YES]fused_lamb................... ...................[YES][OKAY] ......[OKAY][YES] [OKAY] fused_lamb...... .............[OKAY]fused_lamb .............[YES] [YES]...... ......[OKAY] [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY]transformer sparse_attn sparse_attn ............ ............transformer ............ [YES] [NO] .........................[NO] [YES][OKAY][OKAY] ....... ...... [OKAY][OKAY]stochastic_transformer transformer. transformer stochastic_transformer [YES]......................... ......[YES] [YES][OKAY][YES]...... ......[OKAY]...... [OKAY][OKAY] stochastic_transformerstochastic_transformer . .[YES] [YES]...... ......[OKAY] [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version ....................torch version 1.8.2.................... 1.8.2torch cuda version ............... torch cuda version11.1 ............... nvcc version11.1 ..................... nvcc version11.2 ..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info ...................deepspeed info 0.5.5+58a8e13, 58a8e13, master................... deepspeed wheel compiled w.0.5.5+58a8e13, 58a8e13, master ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY] [OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name -------------------------------------------------- ................op name ................installed op name................ installed ..installed ................ ..installedcompatible .. compatible -------------------------------------------------- ..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES]cpu_adam [YES].....................cpu_adam [OKAY] ...... ...............[YES] [OKAY]......[YES] [OKAY]...... fused_adam[OKAY] ............. fused_adam[YES] fused_adam................... [YES][OKAY].............fused_adam ......[YES]............. fused_lamb[OKAY]......[YES] .............[OKAY]...... [YES]fused_lamb[OKAY] ...... .............fused_lamb [OKAY][YES]............. fused_lamb ...... .............[YES][OKAY] [YES]...... ......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............sparse_attn .......[YES]sparse_attn ............ ...... [OKAY] ............[OKAY] [NO] [NO]transformer .......stochastic_transformer....... ............ [OKAY].[OKAY][YES] [YES]...... transformertransformer ...... [OKAY] ............ [OKAY]............ [YES][YES] ......stochastic_transformer...... [OKAY].[OKAY] [YES] ...... [OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] ....... utils[OKAY] .................. [YES] ......utils [OKAY].................. [YES] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... utils[OKAY] .................. [YES] ...... quantizer[OKAY] .............. [NO] ....... [OKAY]quantizer .............. [NO] .......-------------------------------------------------- [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... DeepSpeed general environment info:['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.torch install path ..................... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name op name ................................ ................ installed................installedinstalled ....installed.. compatible compatible ..compatible---------------------------------------------------------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... [YES][YES]cpu_adam .......................................... [OKAY][OKAY][YES][YES] ............ [OKAY][OKAY] fused_adam ............. fused_adam[YES] fused_adam......fused_adam............. .............[OKAY][YES]............. [YES] ......[YES] [OKAY]......fused_lamb ...... .............[OKAY][OKAY] fused_lamb[YES] ................... fused_lambfused_lamb[OKAY] [YES] ............. ............. ...... [YES] [YES] [OKAY] ...... ......[OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................sparse_attn sparse_attn [NO] [YES] ..................................... [OKAY][NO][OKAY] [NO] .......transformer....... stochastic_transformer [OKAY]............ [OKAY]. [YES]transformer transformer [YES] .................. ............ [YES]...... [OKAY] [YES][OKAY] ...... ......[OKAY] stochastic_transformer [OKAY] . stochastic_transformer[YES] .......stochastic_transformer [YES][OKAY]. ......[YES] [OKAY]...... [OKAY] ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. *****************************************  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................................................ installed installedinstalled installed .. ......compatible compatible compatible compatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam............... ............... [YES]............... ............... [YES] ......[YES][YES]...... ......[OKAY]......[OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adam.............fused_adam............. .............[YES]............. [YES] [YES] ......[YES] ...... ............ [OKAY][OKAY] [OKAY] [OKAY] fused_lambfused_lambfused_lamb fused_lamb ............. .......................... ............. [YES][YES] [YES] [YES] ...... .................. [OKAY] [OKAY][OKAY] [OKAY] sparse_attnsparse_attnsparse_attnsparse_attn .................................... ............ [NO][NO] [NO][NO]....... .....................[OKAY] [OKAY][OKAY] [OKAY] transformertransformer transformertransformer ............ ............ ........................[YES] [YES] [YES] [YES]...... ...... ......[OKAY] ...... [OKAY][OKAY][OKAY] stochastic_transformer . [YES]stochastic_transformerstochastic_transformerstochastic_transformer ........ .[OKAY][YES] ......[YES] [YES][OKAY]...... ......[OKAY] [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version DeepSpeed general environment info:..................... 11.2 deepspeed install path ........... torch install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info............... ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']...... torch 1.8, cuda 11.1 torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yumquantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- op nameop name ................ ................ op name................installed installed ................ .. installed.. installed compatible .. compatible..--------------------------------------------------compatible --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam...............[OKAY]cpu_adam ...............[YES]............... ......[YES][YES] [OKAY] ......fused_adam...... .............[OKAY] [OKAY] [YES] ...... fused_adam[OKAY] ............. [YES] fused_adam......fused_lamb fused_adam [OKAY] ....................................... -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [YES][YES][YES] fused_lamb............ ...... [OKAY][OKAY]............. [OKAY] [YES]fused_lamb ...... fused_lamb ............. [OKAY] ............. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja [YES] sparse_attn [YES] ...... ............ ...... [NO][OKAY][OKAY] .......sparse_attn [OKAY]............ [NO] .......transformer [OKAY]............ [YES] transformer...... sparse_attnsparse_attn............ [OKAY] ............ [YES]............ [NO]......stochastic_transformer[NO] [OKAY]....... . .......[YES] [OKAY]stochastic_transformer [OKAY] ...... . transformer [OKAY] [YES] transformer............ [YES].................. [OKAY]......[YES] [OKAY]...... [OKAY] stochastic_transformer .stochastic_transformer [YES] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY]--------------------------------------------------[OKAY] op name ----------------------------------------------------------------------------------------------------................-------------------------------------------------- op nameop nameinstalled op name .................. ................................ compatibleinstalledinstalled installed --------------------------------------------------.... ..compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES] ...... [OKAY] cpu_adam cpu_adamcpu_adam............... ...............[YES]............... ......fused_adam[YES] [YES][OKAY]............. ...... ...... [YES] [OKAY] [OKAY]...... fused_adam[OKAY] ............. [YES] ...... [OKAY]fused_adam fused_lambfused_adam ....................................... fused_lamb [YES][YES].............[YES] [YES] ........................ [OKAY][OKAY][OKAY] [OKAY] fused_lambfused_lamb .......................... [YES][YES]sparse_attn ........................ sparse_attn [OKAY][NO][OKAY]............ .......[NO] [OKAY]....... [OKAY] transformer ............ transformer [YES]............ ...... sparse_attn[YES]sparse_attn [OKAY] ............ ...... ............[OKAY][NO] [NO]stochastic_transformer....... stochastic_transformer[OKAY]........ .[YES] [OKAY]transformer[YES] ............ transformer............ [OKAY][OKAY] ............[YES] [YES]...... ......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- ................op nameop name op name installed................................................ .. installedinstalledinstalled ..compatible.... compatible compatible--------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam............... ............... [YES]............... [YES] ......[YES][YES]...... [OKAY]......[OKAY]...... [OKAY][OKAY] fused_adam .............fused_adam fused_adam[YES] fused_adam .......................... ...... ............. [YES][OKAY][YES] [YES] ...... ...... ......fused_lamb [OKAY][OKAY]............. [OKAY][YES] ...... fused_lamb[OKAY]fused_lamb fused_lamb ............. ............. ............. [YES] [YES][YES] .................. [OKAY][OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES]sparse_attnsparse_attn sparse_attn.............................. [OKAY]............[NO] [NO] .......[NO]....... stochastic_transformer[OKAY].......[OKAY] [OKAY].transformertransformer ............[YES]............ transformer[YES]......[YES] ........................[OKAY] [YES][OKAY] ...... [OKAY] [OKAY] stochastic_transformer stochastic_transformer. stochastic_transformer.[YES] .[YES]...... [YES] ...... [OKAY] ......[OKAY] [OKAY] **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO][YES] ............. [OKAY][OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop nameop name op name................................................ installedinstalled installed.................... ..compatibleinstalled compatible compatible..---------------------------------------------------------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam cpu_adam[YES] ............... [YES] ...... [OKAY] [YES]..................... ...... [OKAY][YES] fused_adam [OKAY]................... [OKAY][YES] fused_adam ................... [YES][OKAY] fused_adam ...... .............fused_adam[OKAY] fused_lamb .............[YES] ................... [YES]fused_lamb[OKAY] [YES] ................... [YES] [OKAY] ......fused_lamb ...... [OKAY] ............. [OKAY] [YES] ...... [OKAY]fused_lamb sparse_attn ......................... [NO]sparse_attn[YES] ................... sparse_attn[NO][OKAY]...... [OKAY] ................... [OKAY] transformer[NO] ................... transformer[YES] [OKAY]............ [YES] ...... ...... transformer [OKAY] [OKAY] sparse_attn ............ stochastic_transformer[YES]stochastic_transformer .................... [YES][YES][NO][OKAY] ...... stochastic_transformer....... ......[OKAY] . [OKAY] [OKAY] [YES] ......transformer ............[OKAY] [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [YES] ...... [OKAY] async_ioquantizer ............................. [NO] ....... [NO] [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name-------------------------------------------------- op name................op name op name................installed................ ................installed..installed installed..compatible.. .. compatiblecompatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ..............................[YES]cpu_adam [YES][YES]..................... ...... ......[YES] [OKAY] [OKAY] [OKAY]...... [OKAY] fused_adamfused_adamfused_adam ..........................fused_adam............. [YES] ............. [YES][YES] ...... [YES]...... ...... [OKAY] ...... [OKAY] [OKAY] [OKAY] fused_lamb .............fused_lambfused_lamb fused_lamb.............[YES] ............. .............[YES][YES]...... [YES]......[OKAY] ...... [OKAY]...... [OKAY][OKAY] sparse_attnsparse_attn ........................sparse_attnsparse_attn [NO][NO]........................ .......[NO].......[NO] .......[OKAY]....... [OKAY] [OKAY][OKAY]transformer transformertransformer............ transformer ............ ............[YES]............ [YES][YES][YES] ...... ...... ......[OKAY]...... [OKAY][OKAY][OKAY] stochastic_transformer .stochastic_transformerstochastic_transformer stochastic_transformer [YES] ......... [YES][YES][YES][OKAY] .................. [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ..................[OKAY]..................[OKAY] [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- op name-------------------------------------------------- op name................ -------------------------------------------------- installed................op name op name.. installed ................ ................compatible.. compatibleinstalled--------------------------------------------------installed -------------------------------------------------- .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]......cpu_adam cpu_adam ...... [OKAY]............... ............... [OKAY] [YES][YES] ............ [OKAY][OKAY]fused_adam .............fused_adam [YES]............. ......[YES] [OKAY] fused_adam......fused_adam fused_lamb[OKAY]............. ............. .............[YES][YES] fused_lamb [YES] ................... ...... ......[YES] [OKAY] [OKAY]......[OKAY] [OKAY] fused_lambfused_lamb .......................... [YES][YES] ............ [OKAY][OKAY] sparse_attn sparse_attn............ ............[NO] [NO]....... .......[OKAY] [OKAY] transformersparse_attn transformersparse_attn ............ ............ ........................[YES] [NO][YES][NO]...... [OKAY]....... ............. [OKAY][OKAY][OKAY]stochastic_transformer transformer. transformerstochastic_transformer[YES]............ ...................[YES] [OKAY][YES][YES] ...... ......[OKAY]...... [OKAY] [OKAY] stochastic_transformer . [YES]stochastic_transformer ...... .[OKAY] [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ...................................................... [OKAY] [OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op name................ ninja ................................installed installed..................installed.. [OKAY]..compatible.. ----------------------------------------------------------------------------------------------------compatiblecompatible op name---------------------------------------------------------------------------------------------------- ................ installed .. cpu_adamcompatible ............... --------------------------------------------------[YES]cpu_adam cpu_adam .................................... [OKAY][YES][YES] cpu_adam............ ...............[OKAY][OKAY] [YES] fused_adam...... ............. [OKAY][YES] ...... fused_adamfused_adam[OKAY] .......................... [YES][YES] fused_lamb............ fused_adam ............. [OKAY][OKAY] ............. [YES] [YES]...... fused_lambfused_lamb[OKAY] ...... ..........................[OKAY] [YES][YES] ............ fused_lamb[OKAY][OKAY] ............. [YES] ...... [OKAY]sparse_attn ............ [NO] ....... [OKAY]sparse_attn sparse_attn ........................ transformer [NO][NO] sparse_attn ............ .......................... [OKAY] [YES][OKAY] [NO] transformer......transformer....... ............ [OKAY][OKAY] ............ [YES][YES] transformer......stochastic_transformer ...... ............ [OKAY]. [OKAY][YES][YES] stochastic_transformer............ .stochastic_transformer[OKAY][OKAY] [YES]. ......[YES] stochastic_transformer [OKAY] ...... . [OKAY][YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ...............async_io [NO] ............... .......[NO] [NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path torch install path............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.2 .................... 1.8.2torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']........... deepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']................... deepspeed info0.5.5+58a8e13, 58a8e13, master ................... deepspeed wheel compiled w.0.5.5+58a8e13, 58a8e13, master ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > setting tensorboard ... -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name op nameop name................................ ................................installedinstalled installed..installed.. compatible.. ..compatible-------------------------------------------------- compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES] ...............cpu_adam...... ............... [OKAY][YES] [YES]............... ...... ......[YES] [OKAY][OKAY]...... fused_adam[OKAY] ............. [YES] ......fused_adam fused_adam[OKAY] fused_adam.......................... ............. [YES][YES]fused_lamb[YES] ............................... [OKAY][OKAY] [OKAY][YES] ......fused_lamb [OKAY]............. fused_lambfused_lamb [YES].......................... ......[YES][YES] [OKAY]............ sparse_attn [OKAY] [OKAY] ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] sparse_attn sparse_attn.......[YES] ............[OKAY] ............ ...... [NO][NO]transformer[OKAY] .......................... stochastic_transformer[YES][OKAY][OKAY] ...... . [OKAY]transformer[YES]transformer .............................. stochastic_transformer [YES][YES][OKAY]. ...... ...... [YES] [OKAY] [OKAY] ...... [OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ...................DeepSpeed general environment info: 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version ....................['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] 1.8.2 torch versiontorch cuda version ................................... 1.8.211.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.5.5+58a8e13, 58a8e13, master['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** using world size: 64, data-parallel-size: 4, tensor-model-parallel size: 4, pipeline-model-parallel size: 4 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 4 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.1825190.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-merges.txt micro_batch_size ................................ 8 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 4 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 4 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/tr6-1B3-prefix-lm-unbiased-loss-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > padded vocab (size: 50257) with 431 dummy tokens (new size: 50688) > initializing torch distributed ... ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop nameop name ................op name ................................ ................installed installed installedinstalled .... ..compatible..compatible -------------------------------------------------- compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ......cpu_adam............... cpu_adam [OKAY] ..............................[YES] ......[YES] [YES] [OKAY] ...... ...... fused_adam[OKAY] [OKAY] ............. [YES] fused_adam...... [OKAY]............. [YES] fused_adam......fused_adam fused_lamb ............. ..........................[OKAY] [YES][YES][YES] ............ fused_lamb ...................[OKAY][OKAY] [YES][OKAY] ...... [OKAY]fused_lamb fused_lamb............. [YES]............. ......[YES]sparse_attn [OKAY].................. [OKAY]sparse_attn[NO] ................... [OKAY][NO] ....... [OKAY]transformer ............sparse_attn transformer sparse_attn[YES] ............ ............[NO].................. [OKAY][YES]....... [NO]......[OKAY] .......[OKAY]stochastic_transformer transformer[OKAY]. ............[YES]stochastic_transformertransformer [YES]................... [OKAY]......[YES] [YES] ......[OKAY]...... [OKAY] [OKAY]stochastic_transformer . [YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY]async_io ............... [NO] .......utils [NO].................. [YES] ...... [OKAY] quantizer .............. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ............... torch version .................... 1.8.2 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch cuda version ............... torch version11.1 ....................nvcc version 1.8.2..................... 11.2torch cuda version deepspeed install path............... 11.1........... nvcc version['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ..................... deepspeed info11.2 ...................deepspeed install path 0.5.5+58a8e13, 58a8e13, master........... deepspeed wheel compiled w. ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']...... deepspeed infotorch 1.8, cuda 11.1 ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** **** Git info for Megatron: git_hash=0d4308b git_branch=thomas/fix_aggregated_loss **** > initializing tensor model parallel with size 4 > initializing pipeline model parallel with size 4 > setting random seeds to 1234 ... [2021-11-04 05:38:40,414] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.328 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 50.903 seconds time to initialize megatron (seconds): 136.596 [after megatron is initialized] datetime: 2021-11-04 05:39:36 building GPT model ... [2021-11-04 05:39:36,647] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-04 05:39:36,648] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-04 05:39:36,648] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.41 GB, percent = 22.1% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=0, model=1): 1, ProcessCoord(pipe=0, data=0, model=2): 2, ProcessCoord(pipe=0, data=0, model=3): 3, ProcessCoord(pipe=0, data=1, model=0): 4, ProcessCoord(pipe=0, data=1, model=1): 5, ProcessCoord(pipe=0, data=1, model=2): 6, ProcessCoord(pipe=0, data=1, model=3): 7, ProcessCoord(pipe=0, data=2, model=0): 8, ProcessCoord(pipe=0, data=2, model=1): 9, ProcessCoord(pipe=0, data=2, model=2): 10, ProcessCoord(pipe=0, data=2, model=3): 11, ProcessCoord(pipe=0, data=3, model=0): 12, ProcessCoord(pipe=0, data=3, model=1): 13, ProcessCoord(pipe=0, data=3, model=2): 14, ProcessCoord(pipe=0, data=3, model=3): 15, ProcessCoord(pipe=1, data=0, model=0): 16, ProcessCoord(pipe=1, data=0, model=1): 17, ProcessCoord(pipe=1, data=0, model=2): 18, ProcessCoord(pipe=1, data=0, model=3): 19, ProcessCoord(pipe=1, data=1, model=0): 20, ProcessCoord(pipe=1, data=1, model=1): 21, ProcessCoord(pipe=1, data=1, model=2): 22, ProcessCoord(pipe=1, data=1, model=3): 23, ProcessCoord(pipe=1, data=2, model=0): 24, ProcessCoord(pipe=1, data=2, model=1): 25, ProcessCoord(pipe=1, data=2, model=2): 26, ProcessCoord(pipe=1, data=2, model=3): 27, ProcessCoord(pipe=1, data=3, model=0): 28, ProcessCoord(pipe=1, data=3, model=1): 29, ProcessCoord(pipe=1, data=3, model=2): 30, ProcessCoord(pipe=1, data=3, model=3): 31, ProcessCoord(pipe=2, data=0, model=0): 32, ProcessCoord(pipe=2, data=0, model=1): 33, ProcessCoord(pipe=2, data=0, model=2): 34, ProcessCoord(pipe=2, data=0, model=3): 35, ProcessCoord(pipe=2, data=1, model=0): 36, ProcessCoord(pipe=2, data=1, model=1): 37, ProcessCoord(pipe=2, data=1, model=2): 38, ProcessCoord(pipe=2, data=1, model=3): 39, ProcessCoord(pipe=2, data=2, model=0): 40, ProcessCoord(pipe=2, data=2, model=1): 41, ProcessCoord(pipe=2, data=2, model=2): 42, ProcessCoord(pipe=2, data=2, model=3): 43, ProcessCoord(pipe=2, data=3, model=0): 44, ProcessCoord(pipe=2, data=3, model=1): 45, ProcessCoord(pipe=2, data=3, model=2): 46, ProcessCoord(pipe=2, data=3, model=3): 47, ProcessCoord(pipe=3, data=0, model=0): 48, ProcessCoord(pipe=3, data=0, model=1): 49, ProcessCoord(pipe=3, data=0, model=2): 50, ProcessCoord(pipe=3, data=0, model=3): 51, ProcessCoord(pipe=3, data=1, model=0): 52, ProcessCoord(pipe=3, data=1, model=1): 53, ProcessCoord(pipe=3, data=1, model=2): 54, ProcessCoord(pipe=3, data=1, model=3): 55, ProcessCoord(pipe=3, data=2, model=0): 56, ProcessCoord(pipe=3, data=2, model=1): 57, ProcessCoord(pipe=3, data=2, model=2): 58, ProcessCoord(pipe=3, data=2, model=3): 59, ProcessCoord(pipe=3, data=3, model=0): 60, ProcessCoord(pipe=3, data=3, model=1): 61, ProcessCoord(pipe=3, data=3, model=2): 62, ProcessCoord(pipe=3, data=3, model=3): 63} [2021-11-04 05:39:37,171] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=9 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe stage=1 layers=6 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=2 layers=6 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe stage=3 layers=11 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (1, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (2, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 1): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (3, 2): 75592704 > number of parameters on (tensor, pipeline) model parallel rank (1, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (1, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (0, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (3, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (2, 3): 105743360 > number of parameters on (tensor, pipeline) model parallel rank (3, 0): 105739264 > number of parameters on (tensor, pipeline) model parallel rank (2, 0): 105739264 [2021-11-04 05:39:37,563] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-04 05:39:37,564] [INFO] [utils.py:807:see_memory_usage] MA 0.21 GB Max_MA 0.21 GB CA 0.22 GB Max_CA 0 GB [2021-11-04 05:39:37,564] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.65 GB, percent = 22.2% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 105739264 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-04 05:39:37,583] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-04 05:39:37,653] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-04 05:39:37,653] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-04 05:39:37,653] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-04 05:39:37,654] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-04 05:39:37,654] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-04 05:39:37,654] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-04 05:39:37,654] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-04 05:39:37,654] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-04 05:39:37,654] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-04 05:39:37,654] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 22 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 26 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 25 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 21 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 29 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 17 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 30 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 18 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 23 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 27 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 19 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 31 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 45 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 33 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 28 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 40 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 16 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 20 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 34 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 38 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 35 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 44 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 41 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 39 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 43 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 42 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 37 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 24 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 32 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 47 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 46 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 36 partition count [4, 4] and sizes[(18874368, False), (23808, False)] Rank: 11 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 6 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 60 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 62 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 1 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 3 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 58 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 54 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 50 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 49 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 51 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 48 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 9 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 61 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 13 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 15 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 52 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 7 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 5 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 10 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 53 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 14 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 56 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 2 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 55 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 59 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 57 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 63 partition count [4, 4] and sizes[(26411008, False), (24832, False)] Rank: 0 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 8 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 12 partition count [4, 4] and sizes[(26411008, False), (23808, False)] Rank: 4 partition count [4, 4] and sizes[(26411008, False), (23808, False)] [2021-11-04 05:39:37,967] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-04 05:39:37,967] [INFO] [utils.py:807:see_memory_usage] MA 0.3 GB Max_MA 0.35 GB CA 0.59 GB Max_CA 1 GB [2021-11-04 05:39:37,968] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.46 GB, percent = 22.7% [2021-11-04 05:39:37,994] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-04 05:39:37,994] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB Max_MA 0.59 GB CA 0.89 GB Max_CA 1 GB [2021-11-04 05:39:37,995] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.46 GB, percent = 22.7% [2021-11-04 05:39:37,995] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-04 05:39:38,018] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-04 05:39:38,019] [INFO] [utils.py:807:see_memory_usage] MA 0.49 GB Max_MA 0.49 GB CA 0.89 GB Max_CA 1 GB [2021-11-04 05:39:38,019] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.46 GB, percent = 22.7% [2021-11-04 05:39:38,019] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-04 05:39:38,019] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-04 05:39:38,019] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-04 05:39:38,019] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-04 05:39:38,019] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] amp_params ................... False [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-04 05:39:38,019] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] dump_state ................... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] pld_params ................... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-04 05:39:38,020] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 8 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] world_size ................... 4 [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-04 05:39:38,021] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-04 05:39:38,021] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 8, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-04 05:39:38,021] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=8 [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=1 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=33 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=35 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=34 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=32 STAGE=2 LAYERS=6 [15, 21) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=2 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=3 STAGE=0 LAYERS=9 [0, 9) STAGE_PARAMS=105739264 (105.739M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=17 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=19 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=16 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=18 STAGE=1 LAYERS=6 [9, 15) STAGE_PARAMS=75592704 (75.593M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=48 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=49 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=50 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) [2021-11-04 05:39:38,312] [INFO] [engine.py:151:__init__] RANK=51 STAGE=3 LAYERS=11 [21, 32) STAGE_PARAMS=105743360 (105.743M) TOTAL_PARAMS=1450672128 (1450.672M) UNIQUE_PARAMS=1330085888 (1330.086M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 4 ZeRO state_dicts for rank 35 loading 4 zero partition checkpoints for rank 35 successfully loaded 4 ZeRO state_dicts for rank 33 successfully loaded 4 ZeRO state_dicts for rank 28 successfully loaded 4 ZeRO state_dicts for rank 43 successfully loaded 4 ZeRO state_dicts for rank 34 successfully loaded 4 ZeRO state_dicts for rank 20 loading 4 zero partition checkpoints for rank 33 loading 4 zero partition checkpoints for rank 28 loading 4 zero partition checkpoints for rank 43 loading 4 zero partition checkpoints for rank 34 loading 4 zero partition checkpoints for rank 20 successfully loaded 4 ZeRO state_dicts for rank 47 successfully loaded 4 ZeRO state_dicts for rank 59 loading 4 zero partition checkpoints for rank 47 successfully loaded 4 ZeRO state_dicts for rank 46 loading 4 zero partition checkpoints for rank 59 loading 4 zero partition checkpoints for rank 46 successfully loaded 4 ZeRO state_dicts for rank 36 successfully loaded 4 ZeRO state_dicts for rank 37 successfully loaded 4 ZeRO state_dicts for rank 18 successfully loaded 4 ZeRO state_dicts for rank 26 loading 4 zero partition checkpoints for rank 36 loading 4 zero partition checkpoints for rank 37 successfully loaded 4 ZeRO state_dicts for rank 45 loading 4 zero partition checkpoints for rank 18 successfully loaded 4 ZeRO state_dicts for rank 19 loading 4 zero partition checkpoints for rank 26 successfully loaded 4 ZeRO state_dicts for rank 23 successfully loaded 4 ZeRO state_dicts for rank 22 successfully loaded 4 ZeRO state_dicts for rank 39 loading 4 zero partition checkpoints for rank 45 loading 4 zero partition checkpoints for rank 19 loading 4 zero partition checkpoints for rank 23 loading 4 zero partition checkpoints for rank 22 loading 4 zero partition checkpoints for rank 39 successfully loaded 4 ZeRO state_dicts for rank 40 successfully loaded 4 ZeRO state_dicts for rank 42 successfully loaded 4 ZeRO state_dicts for rank 41 loading 4 zero partition checkpoints for rank 40 successfully loaded 4 ZeRO state_dicts for rank 8 successfully loaded 4 ZeRO state_dicts for rank 32 loading 4 zero partition checkpoints for rank 42 successfully loaded 4 ZeRO state_dicts for rank 44 successfully loaded 4 ZeRO state_dicts for rank 0 successfully loaded 4 ZeRO state_dicts for rank 48 loading 4 zero partition checkpoints for rank 41 successfully loaded 4 ZeRO state_dicts for rank 24 successfully loaded 4 ZeRO state_dicts for rank 16 successfully loaded 4 ZeRO state_dicts for rank 27 loading 4 zero partition checkpoints for rank 32 loading 4 zero partition checkpoints for rank 8 loading 4 zero partition checkpoints for rank 44 loading 4 zero partition checkpoints for rank 24 loading 4 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 4 zero partition checkpoints for rank 16 successfully loaded 4 ZeRO state_dicts for rank 38 loading 4 zero partition checkpoints for rank 48 loading 4 zero partition checkpoints for rank 27 successfully loaded 4 ZeRO state_dicts for rank 57 loading 4 zero partition checkpoints for rank 38 loading 4 zero partition checkpoints for rank 57 successfully loaded 4 ZeRO state_dicts for rank 31 successfully loaded 4 ZeRO state_dicts for rank 12 successfully loaded 4 ZeRO state_dicts for rank 49 successfully loaded 4 ZeRO state_dicts for rank 56 successfully loaded 4 ZeRO state_dicts for rank 4 successfully loaded 4 ZeRO state_dicts for rank 29 loading 4 zero partition checkpoints for rank 31 successfully loaded 4 ZeRO state_dicts for rank 60 loading 4 zero partition checkpoints for rank 12 loading 4 zero partition checkpoints for rank 49 successfully loaded 4 ZeRO state_dicts for rank 15 loading 4 zero partition checkpoints for rank 56 loading 4 zero partition checkpoints for rank 29 loading 4 zero partition checkpoints for rank 4 successfully loaded 4 ZeRO state_dicts for rank 6 successfully loaded 4 ZeRO state_dicts for rank 52 loading 4 zero partition checkpoints for rank 60 successfully loaded 4 ZeRO state_dicts for rank 5 loading 4 zero partition checkpoints for rank 15 successfully loaded 4 ZeRO state_dicts for rank 30 successfully loaded 4 ZeRO state_dicts for rank 25 loading 4 zero partition checkpoints for rank 6 loading 4 zero partition checkpoints for rank 30 loading 4 zero partition checkpoints for rank 52 loading 4 zero partition checkpoints for rank 5 successfully loaded 4 ZeRO state_dicts for rank 11 loading 4 zero partition checkpoints for rank 25 successfully loaded 4 ZeRO state_dicts for rank 61 successfully loaded 4 ZeRO state_dicts for rank 53 successfully loaded 4 ZeRO state_dicts for rank 13 loading 4 zero partition checkpoints for rank 11 successfully loaded 4 ZeRO state_dicts for rank 62 loading 4 zero partition checkpoints for rank 61 loading 4 zero partition checkpoints for rank 53 loading 4 zero partition checkpoints for rank 13 successfully loaded 4 ZeRO state_dicts for rank 2 loading 4 zero partition checkpoints for rank 62 successfully loaded 4 ZeRO state_dicts for rank 17 successfully loaded 4 ZeRO state_dicts for rank 51 successfully loaded 4 ZeRO state_dicts for rank 58 successfully loaded 4 ZeRO state_dicts for rank 21 successfully loaded 4 ZeRO state_dicts for rank 14 loading 4 zero partition checkpoints for rank 2 successfully loaded 4 ZeRO state_dicts for rank 10 loading 4 zero partition checkpoints for rank 17 loading 4 zero partition checkpoints for rank 21 loading 4 zero partition checkpoints for rank 51 loading 4 zero partition checkpoints for rank 58 loading 4 zero partition checkpoints for rank 14 loading 4 zero partition checkpoints for rank 10 successfully loaded 4 ZeRO state_dicts for rank 55 successfully loaded 4 ZeRO state_dicts for rank 9 successfully loaded 4 ZeRO state_dicts for rank 1 loading 4 zero partition checkpoints for rank 55 loading 4 zero partition checkpoints for rank 9 successfully loaded 4 ZeRO state_dicts for rank 54 successfully loaded 4 ZeRO state_dicts for rank 50 loading 4 zero partition checkpoints for rank 1 successfully loaded 4 ZeRO state_dicts for rank 63 successfully loaded 4 ZeRO state_dicts for rank 3 loading 4 zero partition checkpoints for rank 54 loading 4 zero partition checkpoints for rank 50 loading 4 zero partition checkpoints for rank 63 loading 4 zero partition checkpoints for rank 3 successfully loaded 4 ZeRO state_dicts for rank 7 loading 4 zero partition checkpoints for rank 7 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints at iteration 30807 time (ms) | load-checkpoint: 15796.49 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.691828224 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.209483264estimated model parameters: 1.209483264 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/code/Megatron-DeepSpeed/megatron/utils.py:276: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters: 1.691828224 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters: 1.691828224 estimated model parameters: 1.691828224 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.69189376 estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.691828224estimated model parameters: 1.691828224estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.209483264 estimated model parameters: 1.691828224 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264estimated model parameters: 1.209483264 estimated model parameters: 1.69189376estimated model parameters: 1.69189376 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.209483264 estimated model parameters without embeddings: 1.2095488 estimated model parameters without embeddings: 1.2095488 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-04 05:39:54 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 4.238386 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.236 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.278 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.078 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... Number of parameters: 1.691828224 billionNumber of parameters: 1.691828224 billion Number of parameters: 1.691828224 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion time (ms) | model-and-optimizer-setup: 17582.97 | train/valid/test-data-iterators-setup: 9691.56 [after dataloaders are built] datetime: 2021-11-04 05:40:06 done with setup ... training ... Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters: 1.69189376 billion Number of parameters: 1.209483264 billion Number of parameters: 1.69189376 billion Number of parameters: 1.69189376 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters: 1.69189376 billionNumber of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.691828224 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.2095488 billion Number of parameters: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion Number of parameters without embeddings: 1.209483264 billion [before the start of training step] datetime: 2021-11-04 05:40:06 [2021-11-04 05:40:06,719] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-04 05:40:06,720] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-04 05:40:06,720] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-04 05:40:06,720] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-04 05:40:06,720] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False [Rank 35] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4454.0 | max reserved: 4454.0 [Rank 3] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5760.0 | max reserved: 5760.0 [Rank 19] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4918.0 | max reserved: 4918.0 [Rank 51] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4380.0517578125 | reserved: 6912.0 | max reserved: 6912.0 [Rank 2] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5712.0 | max reserved: 5712.0 [Rank 18] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4918.0 | max reserved: 4918.0 [Rank 50] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 6848.0 | max reserved: 6848.0 [Rank 34] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4326.0 | max reserved: 4326.0 iteration 31000/ 152972 | consumed samples: 10792384 | consumed tokens: 22102802432 | elapsed time per iteration (ms): 6251.2 | learning rate: 1.903E-04 | global batch size: 512 | lm loss: 2.069374E+00 | loss scale: 524288.0 | grad norm: 38134.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 0] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5760.0 | max reserved: 5760.0 [Rank 16] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4774.0 | max reserved: 4774.0 [Rank 32] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4342.0 | max reserved: 4342.0 [Rank 48] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 7464.0 | max reserved: 7464.0 [Rank 33] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3000.8916015625 | reserved: 4342.0 | max reserved: 4342.0 [Rank 17] (after 31000 iterations) memory (MB) | allocated: 424.458984375 | max allocated: 3432.89208984375 | reserved: 4774.0 | max reserved: 4774.0 [Rank 49] (after 31000 iterations) memory (MB) | allocated: 2393.11083984375 | max allocated: 4379.5517578125 | reserved: 7464.0 | max reserved: 7464.0 [Rank 1] (after 31000 iterations) memory (MB) | allocated: 681.95849609375 | max allocated: 4309.45458984375 | reserved: 5760.0 | max reserved: 5760.0 ------------------------------------------------------------------------------------------------- validation loss at iteration 31000 | lm loss value: 2.049838E+00 | lm loss PPL: 7.766640E+00 | ------------------------------------------------------------------------------------------------- iteration 31200/ 152972 | consumed samples: 10894784 | consumed tokens: 22312517632 | elapsed time per iteration (ms): 7079.7 | learning rate: 1.901E-04 | global batch size: 512 | lm loss: 2.067551E+00 | loss scale: 1048576.0 | grad norm: 75132.397 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 31400/ 152972 | consumed samples: 10997184 | consumed tokens: 22522232832 | elapsed time per iteration (ms): 6104.1 | learning rate: 1.900E-04 | global batch size: 512 | lm loss: 2.066499E+00 | loss scale: 1048576.0 | grad norm: 94494.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 31500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 06:54:24,876] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/mp_rank_00_model_states.pt [2021-11-04 06:54:24,884] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/mp_rank_01_model_states.pt [2021-11-04 06:54:25,251] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 06:54:25,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 06:54:25,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 06:54:25,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 06:54:25,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 06:54:25,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 06:54:25,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 06:54:25,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 06:54:25,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 06:54:25,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 06:54:25,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 06:54:25,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 06:54:25,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 06:54:25,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 06:54:25,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 06:54:25,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 06:54:25,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 06:54:25,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 06:54:25,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 06:54:25,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 06:54:25,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 06:54:25,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 06:54:25,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 06:54:25,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 06:54:25,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 06:54:25,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 06:54:25,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 06:54:25,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 06:54:25,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 06:54:25,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 06:54:25,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 06:54:25,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 06:54:25,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 06:54:25,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 06:54:25,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 06:54:25,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 06:54:25,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 06:54:25,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 06:54:25,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 06:54:25,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 06:54:25,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 06:54:25,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 06:54:25,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 06:54:25,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 06:54:25,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 06:54:25,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 06:54:25,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 06:54:25,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 06:54:25,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 06:54:25,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 06:54:25,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 06:54:25,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 06:54:25,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 06:54:25,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 06:54:25,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 06:54:25,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 06:54:25,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 06:54:25,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 06:54:25,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 06:54:25,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 06:54:25,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 31500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1166.57 iteration 31600/ 152972 | consumed samples: 11099584 | consumed tokens: 22731948032 | elapsed time per iteration (ms): 6150.5 | learning rate: 1.898E-04 | global batch size: 512 | lm loss: 2.087465E+00 | loss scale: 2097152.0 | grad norm: 144070.937 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 31800/ 152972 | consumed samples: 11201984 | consumed tokens: 22941663232 | elapsed time per iteration (ms): 6192.3 | learning rate: 1.896E-04 | global batch size: 512 | lm loss: 2.053407E+00 | loss scale: 2097152.0 | grad norm: 162708.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 07:45:40,958] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=63, lr=[0.00018938783712130853, 0.00018938783712130853], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 32000/ 152972 | consumed samples: 11304384 | consumed tokens: 23151378432 | elapsed time per iteration (ms): 6113.0 | learning rate: 1.894E-04 | global batch size: 512 | lm loss: 2.080148E+00 | loss scale: 524288.0 | grad norm: 42846.790 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 32000 loss: 2.0391 iter time (s): 0.003 samples/sec: 168813.429 ------------------------------------------------------------------------------------------------- validation loss at iteration 32000 | lm loss value: 2.024950E+00 | lm loss PPL: 7.575734E+00 | ------------------------------------------------------------------------------------------------- iteration 32200/ 152972 | consumed samples: 11406784 | consumed tokens: 23361093632 | elapsed time per iteration (ms): 7101.5 | learning rate: 1.892E-04 | global batch size: 512 | lm loss: 2.079365E+00 | loss scale: 524288.0 | grad norm: 44540.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 32400/ 152972 | consumed samples: 11509184 | consumed tokens: 23570808832 | elapsed time per iteration (ms): 6063.2 | learning rate: 1.890E-04 | global batch size: 512 | lm loss: 2.076157E+00 | loss scale: 524288.0 | grad norm: 45004.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 32600/ 152972 | consumed samples: 11611584 | consumed tokens: 23780524032 | elapsed time per iteration (ms): 6065.5 | learning rate: 1.888E-04 | global batch size: 512 | lm loss: 2.062989E+00 | loss scale: 1048576.0 | grad norm: 86946.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 32800/ 152972 | consumed samples: 11713984 | consumed tokens: 23990239232 | elapsed time per iteration (ms): 6070.9 | learning rate: 1.886E-04 | global batch size: 512 | lm loss: 2.065025E+00 | loss scale: 1048576.0 | grad norm: 79200.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33000/ 152972 | consumed samples: 11816384 | consumed tokens: 24199954432 | elapsed time per iteration (ms): 6072.8 | learning rate: 1.884E-04 | global batch size: 512 | lm loss: 2.098187E+00 | loss scale: 2097152.0 | grad norm: 178195.066 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 33000 | lm loss value: 2.043273E+00 | lm loss PPL: 7.715822E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 33000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 09:33:41,742] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/mp_rank_01_model_states.pt [2021-11-04 09:33:41,994] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/mp_rank_00_model_states.pt [2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 09:33:42,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 09:33:42,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 09:33:42,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 09:33:42,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 09:33:42,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 09:33:42,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 09:33:42,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 09:33:42,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 09:33:42,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 09:33:42,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 09:33:42,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 09:33:42,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 09:33:42,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 09:33:42,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 09:33:42,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 09:33:42,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 09:33:42,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 09:33:42,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 09:33:42,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 09:33:42,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 09:33:42,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 09:33:42,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 09:33:42,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 09:33:42,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 09:33:42,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 09:33:42,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 09:33:42,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 09:33:42,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 09:33:42,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 09:33:42,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 09:33:42,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 09:33:42,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 09:33:42,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 09:33:42,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 09:33:42,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 09:33:42,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 09:33:42,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 09:33:42,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 09:33:42,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 09:33:42,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 09:33:42,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 09:33:42,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 09:33:42,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 09:33:42,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 09:33:42,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 09:33:42,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 09:33:42,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 09:33:42,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 09:33:42,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 09:33:42,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-04 09:33:42,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 09:33:42,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 09:33:42,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 09:33:42,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 09:33:42,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 09:33:42,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 09:33:42,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 09:33:42,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 09:33:42,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 09:33:42,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 09:33:42,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_15_optim_states.pt successfully saved checkpoint at iteration 33000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1498.88 iteration 33200/ 152972 | consumed samples: 11918784 | consumed tokens: 24409669632 | elapsed time per iteration (ms): 7105.0 | learning rate: 1.882E-04 | global batch size: 512 | lm loss: 2.071210E+00 | loss scale: 1048576.0 | grad norm: 90033.149 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33400/ 152972 | consumed samples: 12021184 | consumed tokens: 24619384832 | elapsed time per iteration (ms): 6158.6 | learning rate: 1.880E-04 | global batch size: 512 | lm loss: 2.091822E+00 | loss scale: 1048576.0 | grad norm: 85197.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33600/ 152972 | consumed samples: 12123584 | consumed tokens: 24829100032 | elapsed time per iteration (ms): 6330.3 | learning rate: 1.878E-04 | global batch size: 512 | lm loss: 2.090412E+00 | loss scale: 1048576.0 | grad norm: 89363.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33800/ 152972 | consumed samples: 12225984 | consumed tokens: 25038815232 | elapsed time per iteration (ms): 6316.8 | learning rate: 1.876E-04 | global batch size: 512 | lm loss: 2.065592E+00 | loss scale: 2097152.0 | grad norm: 171576.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 11:17:01,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=66, lr=[0.00018738857969774513, 0.00018738857969774513], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 34000 loss: 1.8980 iter time (s): 0.003 samples/sec: 168783.649 iteration 34000/ 152972 | consumed samples: 12328384 | consumed tokens: 25248530432 | elapsed time per iteration (ms): 6119.2 | learning rate: 1.874E-04 | global batch size: 512 | lm loss: 2.063387E+00 | loss scale: 2097152.0 | grad norm: 156525.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 34000 | lm loss value: 2.036580E+00 | lm loss PPL: 7.664353E+00 | ------------------------------------------------------------------------------------------------- iteration 34200/ 152972 | consumed samples: 12430784 | consumed tokens: 25458245632 | elapsed time per iteration (ms): 7156.1 | learning rate: 1.872E-04 | global batch size: 512 | lm loss: 2.061139E+00 | loss scale: 1048576.0 | grad norm: 86702.237 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 34400/ 152972 | consumed samples: 12533184 | consumed tokens: 25667960832 | elapsed time per iteration (ms): 6079.2 | learning rate: 1.870E-04 | global batch size: 512 | lm loss: 2.071460E+00 | loss scale: 1048576.0 | grad norm: 88209.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 34500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 12:11:17,372] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/mp_rank_00_model_states.pt [2021-11-04 12:11:17,391] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/mp_rank_01_model_states.pt [2021-11-04 12:11:17,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 12:11:17,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 12:11:17,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 12:11:17,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 12:11:17,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 12:11:17,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 12:11:17,786] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 12:11:17,786] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 12:11:17,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 12:11:17,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 12:11:17,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 12:11:17,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 12:11:17,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 12:11:17,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 12:11:17,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 12:11:17,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 12:11:17,803] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 12:11:17,803] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 12:11:17,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 12:11:17,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 12:11:17,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 12:11:17,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 12:11:17,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 12:11:17,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 12:11:17,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 12:11:17,818] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 12:11:17,818] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 12:11:17,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 12:11:17,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 12:11:17,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 12:11:17,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 12:11:17,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 12:11:17,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 12:11:17,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 12:11:17,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 12:11:17,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 12:11:17,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 12:11:17,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 12:11:17,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 12:11:17,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 12:11:17,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 12:11:17,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 12:11:17,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-04 12:11:17,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 12:11:17,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 12:11:17,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 12:11:17,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 12:11:17,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 12:11:17,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 12:11:17,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 12:11:17,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 12:11:17,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 12:11:17,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 12:11:17,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 12:11:17,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 12:11:17,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 12:11:17,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 12:11:17,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 12:11:17,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 12:11:17,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 12:11:17,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 12:11:17,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 12:11:17,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 12:11:17,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_12_optim_states.pt successfully saved checkpoint at iteration 34500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1120.16 iteration 34600/ 152972 | consumed samples: 12635584 | consumed tokens: 25877676032 | elapsed time per iteration (ms): 6196.1 | learning rate: 1.868E-04 | global batch size: 512 | lm loss: 2.065935E+00 | loss scale: 2097152.0 | grad norm: 263118.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 34800/ 152972 | consumed samples: 12737984 | consumed tokens: 26087391232 | elapsed time per iteration (ms): 6079.0 | learning rate: 1.865E-04 | global batch size: 512 | lm loss: 2.050999E+00 | loss scale: 2097152.0 | grad norm: 153991.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35000/ 152972 | consumed samples: 12840384 | consumed tokens: 26297106432 | elapsed time per iteration (ms): 6069.3 | learning rate: 1.863E-04 | global batch size: 512 | lm loss: 2.064597E+00 | loss scale: 1048576.0 | grad norm: 80465.124 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 35000 | lm loss value: 2.041205E+00 | lm loss PPL: 7.699883E+00 | ------------------------------------------------------------------------------------------------- iteration 35200/ 152972 | consumed samples: 12942784 | consumed tokens: 26506821632 | elapsed time per iteration (ms): 7104.2 | learning rate: 1.861E-04 | global batch size: 512 | lm loss: 2.308412E+00 | loss scale: 32768.0 | grad norm: 15419.527 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35400/ 152972 | consumed samples: 13045184 | consumed tokens: 26716536832 | elapsed time per iteration (ms): 6068.8 | learning rate: 1.859E-04 | global batch size: 512 | lm loss: 2.139549E+00 | loss scale: 32768.0 | grad norm: 2699.080 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35600/ 152972 | consumed samples: 13147584 | consumed tokens: 26926252032 | elapsed time per iteration (ms): 6053.8 | learning rate: 1.857E-04 | global batch size: 512 | lm loss: 2.072784E+00 | loss scale: 32768.0 | grad norm: 2697.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35800/ 152972 | consumed samples: 13249984 | consumed tokens: 27135967232 | elapsed time per iteration (ms): 6055.5 | learning rate: 1.855E-04 | global batch size: 512 | lm loss: 2.069739E+00 | loss scale: 65536.0 | grad norm: 5183.529 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 14:46:46,818] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=74, lr=[0.00018523568489549322, 0.00018523568489549322], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 36000 loss: 1.9417 iter time (s): 0.003 samples/sec: 169401.892 iteration 36000/ 152972 | consumed samples: 13352384 | consumed tokens: 27345682432 | elapsed time per iteration (ms): 6063.3 | learning rate: 1.852E-04 | global batch size: 512 | lm loss: 2.069384E+00 | loss scale: 65536.0 | grad norm: 4855.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 36000 | lm loss value: 2.035071E+00 | lm loss PPL: 7.652793E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 14:50:13,431] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/mp_rank_00_model_states.pt [2021-11-04 14:50:13,458] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/mp_rank_01_model_states.pt [2021-11-04 14:50:13,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 14:50:13,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 14:50:13,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 14:50:13,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 14:50:13,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 14:50:13,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 14:50:13,822] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 14:50:13,822] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 14:50:13,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 14:50:13,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 14:50:13,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 14:50:13,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 14:50:13,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 14:50:13,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 14:50:13,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 14:50:13,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 14:50:13,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 14:50:13,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 14:50:13,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 14:50:13,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 14:50:13,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 14:50:13,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 14:50:13,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 14:50:13,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 14:50:13,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 14:50:13,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 14:50:13,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 14:50:13,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 14:50:13,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 14:50:13,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 14:50:13,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 14:50:13,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 14:50:13,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 14:50:13,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 14:50:13,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 14:50:13,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 14:50:13,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 14:50:13,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 14:50:13,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 14:50:13,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 14:50:13,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 14:50:13,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 14:50:13,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 14:50:13,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 14:50:13,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 14:50:13,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 14:50:13,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 14:50:13,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 14:50:13,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 14:50:13,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 14:50:13,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 14:50:13,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 14:50:13,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 14:50:13,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 14:50:13,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 14:50:13,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 14:50:13,993] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 14:50:13,994] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 14:50:13,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 14:50:13,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-04 14:50:14,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 14:50:14,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 14:50:14,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 14:50:14,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1139.27 iteration 36200/ 152972 | consumed samples: 13454784 | consumed tokens: 27555397632 | elapsed time per iteration (ms): 7166.9 | learning rate: 1.850E-04 | global batch size: 512 | lm loss: 2.067782E+00 | loss scale: 131072.0 | grad norm: 10772.264 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 36400/ 152972 | consumed samples: 13557184 | consumed tokens: 27765112832 | elapsed time per iteration (ms): 6264.7 | learning rate: 1.848E-04 | global batch size: 512 | lm loss: 2.064779E+00 | loss scale: 131072.0 | grad norm: 9982.568 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 36600/ 152972 | consumed samples: 13659584 | consumed tokens: 27974828032 | elapsed time per iteration (ms): 6172.6 | learning rate: 1.846E-04 | global batch size: 512 | lm loss: 2.058450E+00 | loss scale: 131072.0 | grad norm: 10480.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 36800/ 152972 | consumed samples: 13761984 | consumed tokens: 28184543232 | elapsed time per iteration (ms): 6160.5 | learning rate: 1.843E-04 | global batch size: 512 | lm loss: 2.054813E+00 | loss scale: 262144.0 | grad norm: 21606.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 37000/ 152972 | consumed samples: 13864384 | consumed tokens: 28394258432 | elapsed time per iteration (ms): 6155.4 | learning rate: 1.841E-04 | global batch size: 512 | lm loss: 2.065561E+00 | loss scale: 262144.0 | grad norm: 22388.672 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 37000 | lm loss value: 2.038658E+00 | lm loss PPL: 7.680298E+00 | ------------------------------------------------------------------------------------------------- iteration 37200/ 152972 | consumed samples: 13966784 | consumed tokens: 28603973632 | elapsed time per iteration (ms): 7324.4 | learning rate: 1.839E-04 | global batch size: 512 | lm loss: 2.053246E+00 | loss scale: 524288.0 | grad norm: 38638.738 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 37400/ 152972 | consumed samples: 14069184 | consumed tokens: 28813688832 | elapsed time per iteration (ms): 6088.0 | learning rate: 1.836E-04 | global batch size: 512 | lm loss: 2.044884E+00 | loss scale: 524288.0 | grad norm: 42099.513 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 37500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 17:28:04,918] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/mp_rank_01_model_states.pt [2021-11-04 17:28:04,926] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/mp_rank_00_model_states.pt [2021-11-04 17:28:05,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 17:28:05,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 17:28:05,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 17:28:05,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 17:28:05,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 17:28:05,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 17:28:05,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 17:28:05,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 17:28:05,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 17:28:05,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 17:28:05,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 17:28:05,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 17:28:05,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 17:28:05,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 17:28:05,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 17:28:05,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 17:28:05,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 17:28:05,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 17:28:05,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 17:28:05,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 17:28:05,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 17:28:05,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 17:28:05,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 17:28:05,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 17:28:05,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 17:28:05,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 17:28:05,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 17:28:05,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 17:28:05,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 17:28:05,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 17:28:05,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 17:28:05,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 17:28:05,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 17:28:05,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 17:28:05,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 17:28:05,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 17:28:05,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 17:28:05,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 17:28:05,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 17:28:05,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 17:28:05,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 17:28:05,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 17:28:05,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 17:28:05,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 17:28:05,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 17:28:05,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 17:28:05,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 17:28:05,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 17:28:05,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 17:28:05,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 17:28:05,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 17:28:05,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 17:28:05,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 17:28:05,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 17:28:05,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 17:28:05,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 17:28:05,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 17:28:05,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 17:28:05,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 17:28:05,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 17:28:05,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 17:28:05,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 17:28:05,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 17:28:05,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 37500 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1121.51 iteration 37600/ 152972 | consumed samples: 14171584 | consumed tokens: 29023404032 | elapsed time per iteration (ms): 6099.0 | learning rate: 1.834E-04 | global batch size: 512 | lm loss: 2.056508E+00 | loss scale: 524288.0 | grad norm: 42464.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 37800/ 152972 | consumed samples: 14273984 | consumed tokens: 29233119232 | elapsed time per iteration (ms): 6074.2 | learning rate: 1.832E-04 | global batch size: 512 | lm loss: 2.055578E+00 | loss scale: 1048576.0 | grad norm: 81142.019 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 18:18:44,735] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=75, lr=[0.00018292011486489588, 0.00018292011486489588], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 38000/ 152972 | consumed samples: 14376384 | consumed tokens: 29442834432 | elapsed time per iteration (ms): 6083.7 | learning rate: 1.829E-04 | global batch size: 512 | lm loss: 2.050276E+00 | loss scale: 1048576.0 | grad norm: 83594.919 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 38000 loss: 2.0121 iter time (s): 0.003 samples/sec: 167999.346 ------------------------------------------------------------------------------------------------- validation loss at iteration 38000 | lm loss value: 2.026610E+00 | lm loss PPL: 7.588318E+00 | ------------------------------------------------------------------------------------------------- iteration 38200/ 152972 | consumed samples: 14478784 | consumed tokens: 29652549632 | elapsed time per iteration (ms): 7248.0 | learning rate: 1.827E-04 | global batch size: 512 | lm loss: 2.045791E+00 | loss scale: 1048576.0 | grad norm: 88471.890 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 38400/ 152972 | consumed samples: 14581184 | consumed tokens: 29862264832 | elapsed time per iteration (ms): 6081.8 | learning rate: 1.824E-04 | global batch size: 512 | lm loss: 2.060999E+00 | loss scale: 1048576.0 | grad norm: 83390.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 38600/ 152972 | consumed samples: 14683584 | consumed tokens: 30071980032 | elapsed time per iteration (ms): 6089.2 | learning rate: 1.822E-04 | global batch size: 512 | lm loss: 2.034178E+00 | loss scale: 1048576.0 | grad norm: 76433.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 38800/ 152972 | consumed samples: 14785984 | consumed tokens: 30281695232 | elapsed time per iteration (ms): 6089.2 | learning rate: 1.820E-04 | global batch size: 512 | lm loss: 2.041228E+00 | loss scale: 1048576.0 | grad norm: 81479.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39000/ 152972 | consumed samples: 14888384 | consumed tokens: 30491410432 | elapsed time per iteration (ms): 6387.9 | learning rate: 1.817E-04 | global batch size: 512 | lm loss: 2.068646E+00 | loss scale: 2097152.0 | grad norm: 195257.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------------- validation loss at iteration 39000 | lm loss value: 2.030519E+00 | lm loss PPL: 7.618039E+00 | ------------------------------------------------------------------------------------------------- saving checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints [2021-11-04 20:08:39,298] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/mp_rank_00_model_states.pt [2021-11-04 20:08:39,301] [INFO] [logging.py:68:log_dist] [Rank 1] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/mp_rank_01_model_states.pt [2021-11-04 20:08:39,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_04_optim_states.pt [2021-11-04 20:08:39,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_07_optim_states.pt [2021-11-04 20:08:39,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_10_optim_states.pt [2021-11-04 20:08:39,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_05_optim_states.pt [2021-11-04 20:08:39,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_08_optim_states.pt [2021-11-04 20:08:39,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_07_optim_states.pt [2021-11-04 20:08:39,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_11_optim_states.pt [2021-11-04 20:08:39,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_11_optim_states.pt [2021-11-04 20:08:39,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_09_optim_states.pt [2021-11-04 20:08:39,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_10_optim_states.pt [2021-11-04 20:08:39,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_07_optim_states.pt [2021-11-04 20:08:39,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_04_optim_states.pt [2021-11-04 20:08:39,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_10_optim_states.pt [2021-11-04 20:08:39,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_08_optim_states.pt [2021-11-04 20:08:39,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_04_optim_states.pt [2021-11-04 20:08:39,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_05_optim_states.pt [2021-11-04 20:08:39,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_11_optim_states.pt [2021-11-04 20:08:39,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_06_optim_states.pt [2021-11-04 20:08:39,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_06_optim_states.pt [2021-11-04 20:08:39,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_08_optim_states.pt [2021-11-04 20:08:39,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_09_optim_states.pt [2021-11-04 20:08:39,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_09_optim_states.pt [2021-11-04 20:08:39,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_04_optim_states.pt [2021-11-04 20:08:39,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_07_optim_states.pt [2021-11-04 20:08:39,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_09_optim_states.pt [2021-11-04 20:08:39,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_11_optim_states.pt [2021-11-04 20:08:39,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_05_optim_states.pt [2021-11-04 20:08:39,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_06_optim_states.pt [2021-11-04 20:08:39,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_05_optim_states.pt [2021-11-04 20:08:39,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_10_optim_states.pt [2021-11-04 20:08:39,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_06_optim_states.pt [2021-11-04 20:08:39,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_08_optim_states.pt [2021-11-04 20:08:39,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-04 20:08:39,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_02_optim_states.pt [2021-11-04 20:08:39,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_14_optim_states.pt [2021-11-04 20:08:39,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_13_optim_states.pt [2021-11-04 20:08:39,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-04 20:08:39,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_14_optim_states.pt [2021-11-04 20:08:39,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_02_optim_states.pt [2021-11-04 20:08:39,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_13_optim_states.pt [2021-11-04 20:08:39,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_15_optim_states.pt [2021-11-04 20:08:39,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_14_optim_states.pt [2021-11-04 20:08:39,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_15_optim_states.pt [2021-11-04 20:08:39,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-04 20:08:39,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-04 20:08:39,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_13_optim_states.pt [2021-11-04 20:08:39,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-04 20:08:39,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_03_optim_states.pt [2021-11-04 20:08:39,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_12_optim_states.pt [2021-11-04 20:08:39,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_13_optim_states.pt [2021-11-04 20:08:39,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_14_optim_states.pt [2021-11-04 20:08:39,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-04 20:08:39,876] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_12_optim_states.pt [2021-11-04 20:08:39,877] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_12_optim_states.pt [2021-11-04 20:08:39,877] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_12_optim_states.pt [2021-11-04 20:08:39,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_15_optim_states.pt [2021-11-04 20:08:39,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_03_optim_states.pt [2021-11-04 20:08:39,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_15_optim_states.pt [2021-11-04 20:08:39,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_02_optim_states.pt [2021-11-04 20:08:39,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-04 20:08:39,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_03_optim_states.pt [2021-11-04 20:08:39,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_03_optim_states.pt [2021-11-04 20:08:39,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_02_optim_states.pt [2021-11-04 20:08:39,889] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/checkpoints/tr6-1B3-prefix-lm-unbiased-loss/checkpoints time (ms) | save-checkpoint: 1132.57 iteration 39200/ 152972 | consumed samples: 14990784 | consumed tokens: 30701125632 | elapsed time per iteration (ms): 7188.0 | learning rate: 1.815E-04 | global batch size: 512 | lm loss: 2.043959E+00 | loss scale: 2097152.0 | grad norm: 163340.761 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39400/ 152972 | consumed samples: 15093184 | consumed tokens: 30910840832 | elapsed time per iteration (ms): 6078.2 | learning rate: 1.812E-04 | global batch size: 512 | lm loss: 2.050927E+00 | loss scale: 524288.0 | grad norm: 39812.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39600/ 152972 | consumed samples: 15195584 | consumed tokens: 31120556032 | elapsed time per iteration (ms): 6080.7 | learning rate: 1.810E-04 | global batch size: 512 | lm loss: 2.059096E+00 | loss scale: 524288.0 | grad norm: 36877.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39800/ 152972 | consumed samples: 15297984 | consumed tokens: 31330271232 | elapsed time per iteration (ms): 6096.6 | learning rate: 1.807E-04 | global batch size: 512 | lm loss: 2.034562E+00 | loss scale: 524288.0 | grad norm: 40947.509 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-04 21:50:08,652] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=81, lr=[0.0001804599959837998, 0.0001804599959837998], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 40000/ 152972 | consumed samples: 15400384 | consumed tokens: 31539986432 | elapsed time per iteration (ms): 6080.1 | learning rate: 1.805E-04 | global batch size: 512 | lm loss: 2.046368E+00 | loss scale: 1048576.0 | grad norm: 79641.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 40000 loss: 1.9234 iter time (s): 0.003 samples/sec: 168896.024 ------------------------------------------------------------------------------------------------- validation loss at iteration 40000 | lm loss value: 2.018202E+00 | lm loss PPL: 7.524780E+00 | ------------------------------------------------------------------------------------------------- iteration 40200/ 152972 | consumed samples: 15502784 | consumed tokens: 31749701632 | elapsed time per iteration (ms): 7223.0 | learning rate: 1.802E-04 | global batch size: 512 | lm loss: 2.039917E+00 | loss scale: 1048576.0 | grad norm: 82712.329 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | srun: Job step aborted: Waiting up to 62 seconds for job step to finish. Killing subprocess 122540 Killing subprocess 1284095 Killing subprocess 908359 Killing subprocess 252675 Killing subprocess 1964291 Killing subprocess 502625 Killing subprocess 2721736 Killing subprocess 122541 Killing subprocess 803909 Killing subprocess 1284096 Killing subprocess 1964292 Killing subprocess 908360 Killing subprocess 2133881 Killing subprocess 532510 Killing subprocess 252676 Killing subprocess 2791463 Killing subprocess 1284097 Killing subprocess 122542 Killing subprocess 908361 Killing subprocess 502626 Killing subprocess 1964293 Killing subprocess 803910 Killing subprocess 2721737 Killing subprocess 1288896 Killing subprocess 252677 Killing subprocess 122543 Killing subprocess 1284099 Killing subprocess 502627 slurmstepd: error: *** STEP 1825190.0 ON r6i3n0 CANCELLED AT 2021-11-04T22:24:00 *** Killing subprocess 908362 Killing subprocess 532511 Killing subprocess 2721738 Killing subprocess 2133882 Killing subprocess 1964294 Main process received SIGTERM, exiting Killing subprocess 2791464 Killing subprocess 532512 Killing subprocess 2133883 Killing subprocess 502629 Killing subprocess 803911 Killing subprocess 803912 Killing subprocess 532513 Killing subprocess 2791465 Main process received SIGTERM, exiting Killing subprocess 1288897 Killing subprocess 2133884 Killing subprocess 252678 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 2721739 Main process received SIGTERM, exiting Killing subprocess 2791466 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 1288898 Killing subprocess 1288899 Main process received SIGTERM, exiting Killing subprocess 980734 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 980735 Killing subprocess 980736 Main process received SIGTERM, exiting Killing subprocess 980737 Main process received SIGTERM, exiting Killing subprocess 429407 Killing subprocess 2991515 Killing subprocess 179727 Killing subprocess 429408 Killing subprocess 2991516 Killing subprocess 179728 Killing subprocess 429409 Killing subprocess 429411 Killing subprocess 2991517 Killing subprocess 2991518 Main process received SIGTERM, exiting Main process received SIGTERM, exiting Killing subprocess 179729 Killing subprocess 179730 Main process received SIGTERM, exiting