***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................................ ................ ................installedinstalledinstalled ..installed.... .. compatiblecompatible compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... cpu_adam...............[YES] [YES].....................[YES] ......[YES][OKAY] ...... [OKAY] ...... [OKAY][OKAY] fused_adam ............. fused_adam[YES] ................... fused_adam[YES][OKAY] fused_adam ............. ...... ............. fused_lamb[YES] [OKAY][YES] ................... fused_lamb...... [YES][OKAY] .............[OKAY]...... [YES]fused_lamb[OKAY] fused_lamb................... [YES] ............. [OKAY] ...... [YES] [OKAY]...... sparse_attn[OKAY] ............ sparse_attn[NO] ................... [NO][OKAY] sparse_attn....... transformer[OKAY]............ ............[NO] [YES]sparse_attn....... transformer ......[OKAY]............ ............ [YES][OKAY] [NO] transformer...... ...................[OKAY]stochastic_transformer [OKAY].[YES] ......[YES]stochastic_transformer [OKAY]......transformer. [OKAY][YES]............ stochastic_transformer......[YES] .[OKAY]...... [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninjafused_adam ............. ..................[YES] [OKAY]...... [OKAY] -------------------------------------------------- ninjaop name ninjafused_lamb.................................. ............. installed[OKAY] .................. [YES] ..[OKAY]......-------------------------------------------------- compatible--------------------------------------------------[OKAY]op name op name................ --------------------------------------------------................installed installed.. ..compatible compatiblecpu_adam sparse_attn-------------------------------------------------- --------------------------------------------------........................... [NO][YES] ....... ......cpu_adam[OKAY] cpu_adam[OKAY]............... ...............transformer[YES] [YES]............ ...... fused_adam...... [YES] [OKAY] [OKAY]............. ...... [YES][OKAY] ...... [OKAY]fused_adam stochastic_transformer............. fused_lamb .[YES]fused_adam .............[YES]............. ......[YES] ...... [YES] [OKAY]...... [OKAY] ...... [OKAY]fused_lamb[OKAY] ............. [YES] fused_lamb...... .............[OKAY] [YES] ...... [OKAY]sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] sparse_attn[YES] ................... ......[OKAY] [NO][OKAY] .......transformer [OKAY] stochastic_transformer ............. [YES][YES] transformer ........................ [OKAY][YES] [OKAY] ...... stochastic_transformer[OKAY] . [YES] ...... stochastic_transformer[OKAY] . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] .......async_io [NO] ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] .......utils [OKAY].................. [YES] ...... [OKAY]utils .................. [YES] ......quantizer [OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name................ op nameop name ................ installed ................installed................ .. installed ..compatibleinstalled .. compatible -------------------------------------------------- ..compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam ...............cpu_adam ...... ..............................[YES][OKAY] [YES]......[YES] ......[OKAY]...... fused_adam [OKAY] [OKAY] ............. [YES] ...... fused_adam[OKAY] fused_adam............. fused_lamb.............[YES] fused_adam ............. ...................[YES][YES] [YES] ...... [OKAY]...... ......[OKAY] [OKAY]fused_lamb[OKAY] ............. fused_lamb[YES]fused_lamb ................................ [OKAY][YES]sparse_attn[YES] ............ ............ [OKAY][OKAY][NO] ....... [OKAY] sparse_attn ............transformer [NO]............ .......[YES] ......sparse_attnsparse_attn [OKAY] [OKAY] ........................ transformer[NO] [NO]............stochastic_transformer....... [YES]........[OKAY] ......[OKAY] [YES] [OKAY] ...... transformer[OKAY]transformer stochastic_transformer ............ ............ .[YES][YES] ......[YES]...... ......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................ ................ installed installed installedinstalled .. .... .. compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ..............................cpu_adam [YES] [YES] [YES]............... ...... ...... ......[YES] [OKAY] [OKAY] [OKAY]...... [OKAY] fused_adamfused_adam fused_adam.......................... .............fused_adam[YES][YES] [YES]......................... ......[YES][OKAY][OKAY] [OKAY]...... fused_lamb[OKAY]fused_lamb fused_lamb.......................... .............fused_lamb[YES][YES] [YES]................... ......[YES]......[OKAY] [OKAY]......[OKAY] [OKAY] sparse_attn sparse_attn............sparse_attn ............sparse_attn[NO]............ [NO]...................[NO] .......[NO].......[OKAY] [OKAY].......[OKAY] [OKAY]transformer ............transformer transformer[YES]............ ..................transformer[YES] [YES][OKAY].................. [OKAY][YES]...... ......stochastic_transformer[OKAY] [OKAY]. stochastic_transformer [YES]stochastic_transformer. ......stochastic_transformer.[YES] [OKAY] .[YES] ...... [YES]......[OKAY] ......[OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................................ installedinstalledinstalledinstalled .. .... ..compatiblecompatiblecompatible --------------------------------------------------compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. ...............cpu_adam[YES][YES] [YES]........................... ......[OKAY][OKAY][YES] [OKAY] ...... [OKAY] fused_adamfused_adam fused_adam.......................... [YES][YES]............. fused_adam ............ [YES] [OKAY]............. [OKAY] ......[YES] [OKAY]fused_lamb...... fused_lamb [OKAY].............fused_lamb............. [YES][YES] ............. ...... ......fused_lamb [YES] [OKAY] .............[OKAY] ...... [YES][OKAY] ...... [OKAY] sparse_attn sparse_attn............sparse_attn ............[NO]............ [NO].......[NO] .......[OKAY]....... sparse_attn[OKAY][OKAY] ............transformer transformertransformer[NO] ........................ ............ [YES][YES].......[YES] .................. [OKAY][OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformerstochastic_transformertransformer . . ............. [YES] [YES] [YES] [YES]............ ......[OKAY]......[OKAY] [OKAY][OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op nameop name ................ ................................ ................ installedinstalled installed installed...... ..compatiblecompatiblecompatible compatible------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam .............................. ............... ...............[YES][YES] [YES][YES]...... ...... ...... ......[OKAY][OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adam fused_adam.......................... .............[YES][YES]............. [YES]............[YES] [OKAY][OKAY]...... ...... [OKAY][OKAY]fused_lambfused_lamb .......................... fused_lamb [YES]fused_lamb [YES] ............. ................... ...... [OKAY][YES][OKAY] [YES] ............ [OKAY][OKAY] sparse_attn sparse_attn............ ............sparse_attnsparse_attn[NO] [NO] ............ ................... ....... [NO] [OKAY] [OKAY].......[NO] [OKAY]....... transformertransformer [OKAY] ............transformer............ [YES]............[YES] ......transformer[YES]...... [OKAY] [OKAY] .................. [YES][OKAY]stochastic_transformer stochastic_transformer....... stochastic_transformer.[OKAY][YES] [YES]....... ......[OKAY]stochastic_transformer[YES] [OKAY] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op nameop name ................ ................ ................ ................installed installed installed installed.. .. .. ..compatible compatible compatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam ............... .............................. ............... [YES] [YES][YES] [YES] ...... .................. [OKAY][OKAY][OKAY][OKAY] fused_adamfused_adamfused_adamfused_adam .................................................... [YES][YES][YES][YES] ........................ [OKAY][OKAY][OKAY][OKAY] fused_lambfused_lambfused_lambfused_lamb ....................................... ............. [YES][YES] [YES][YES] ...... ...... ............ [OKAY] [OKAY][OKAY][OKAY] sparse_attn ............sparse_attnsparse_attnsparse_attn [NO].................................... .......[NO][NO][NO] [OKAY]..................... [OKAY][OKAY][OKAY] transformer ............ transformertransformer [YES]transformer ............ ..............................[YES] [OKAY][YES]...... [YES]......[OKAY] ......[OKAY]stochastic_transformer [OKAY]. stochastic_transformer [YES].stochastic_transformer ......[YES].stochastic_transformer [OKAY]......[YES]. [OKAY]......[YES] [OKAY]...... [OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop name................ op name ................................ installed ................ ..installed installedcompatibleinstalled ..-------------------------------------------------- .. ..compatiblecompatible compatible---------------------------------------------------------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES] ......cpu_adam cpu_adam [OKAY] ...............cpu_adam............... [YES]...............[YES] fused_adam[YES]............ ................... [OKAY][OKAY] [YES] [OKAY]...... [OKAY] fused_lambfused_adam ..........................fused_adamfused_adam [YES] [YES] ............. ................... ...... [OKAY][YES][OKAY] [YES] ............ [OKAY]fused_lamb[OKAY] .............fused_lamb fused_lamb[YES] ..........................sparse_attn...... [YES]............[OKAY][YES] [NO]............ .......[OKAY][OKAY] [OKAY] transformer sparse_attn............ ............[YES] [NO]...... .......[OKAY]sparse_attn sparse_attn[OKAY]............ stochastic_transformer[NO]transformer............ . ................... [NO] [YES][YES][OKAY] ................... [OKAY][OKAY][OKAY]transformer ............ [YES] stochastic_transformer......transformer .............[OKAY] [YES][YES] ......stochastic_transformer...... [OKAY]. [OKAY][YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name op name................ ................ ................................ installedinstalled installed .. ..installed..compatible -------------------------------------------------- compatible..compatible cpu_adamcompatible-------------------------------------------------- -------------------------------------------------- ............... -------------------------------------------------- [YES]cpu_adam ..................... cpu_adam [OKAY] [YES] ...............cpu_adam...... [YES]...............[OKAY] [YES]fused_adam...... .............fused_adam...... [OKAY] [YES] [OKAY]............. ......fused_adam [YES] ............. [OKAY] ...... [YES] [OKAY]......fused_lamb [OKAY].............fused_adam fused_lamb [YES]............. [YES] fused_lamb................... ............. [YES][OKAY][YES] ............ [OKAY] [OKAY]...... [OKAY] sparse_attn ............sparse_attn sparse_attn [NO] ............ fused_lamb............ ....... [OKAY] [NO][NO]transformer ....... [OKAY] ............. ............ .......[YES] transformer [OKAY] [YES] ............ ...... ...... [OKAY][YES]transformer[OKAY] .................. [OKAY][YES] stochastic_transformer ......stochastic_transformer. [OKAY].[YES] [YES]...... stochastic_transformer[OKAY]...... .[OKAY] [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name ................op nameop name................ ................................installedinstalled installed installed.. .... ..compatiblecompatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam ..............................[YES] ............... [YES] ...... [YES] [YES][OKAY] ...... ...... ......[OKAY] [OKAY] fused_adam[OKAY] ............. [YES] ...... [OKAY]fused_adam fused_adam fused_adam.............fused_lamb ............. .............[YES]............. [YES] [YES]............[YES] ......[OKAY][OKAY]...... [OKAY][OKAY] fused_lamb fused_lambfused_lamb............. ..........................[YES] [YES][YES]...... ............[OKAY]sparse_attn [OKAY] [OKAY] ............ [NO] ....... [OKAY] transformer ............ [YES] sparse_attn......sparse_attn sparse_attn ........................ [OKAY] [NO]............ [NO] ....... [NO] stochastic_transformer....... [OKAY] ....... [OKAY]. [OKAY][YES] transformer transformer..................transformer [YES] ............[OKAY]............ [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer .[YES]. [YES]......[YES] [OKAY]............ [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop nameop name op name................................................ ................installedinstalled installedinstalled.. .. ....compatible compatible compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam ...............cpu_adam.............................. ...............[YES][YES] [YES][YES] ...... ...... ............ [OKAY][OKAY] [OKAY][OKAY] fused_adamfused_adam .......................... fused_adamfused_adam[YES][YES] ................................ ...... [OKAY][YES] [YES] [OKAY] ............ fused_lamb [OKAY] [OKAY] ............. fused_lamb [YES]fused_lambfused_lamb............. .............[YES] ...................[YES] ......[OKAY] [YES] ...... ......[OKAY][OKAY] [OKAY] sparse_attn ............ sparse_attn[NO]sparse_attn sparse_attn............................... ............[OKAY][NO][NO] [NO] ..............transformer....... ............[OKAY][OKAY][OKAY] [YES] ......transformertransformer transformer [OKAY] ........................ ............ [YES][YES][YES] ...... stochastic_transformer ............ [OKAY] .[OKAY][OKAY] [YES]stochastic_transformer ....... stochastic_transformerstochastic_transformer [OKAY] [YES] ........ [YES][YES][OKAY] ............ [OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io: please install the libaio-devel package with yumasync_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op nameop name op name ................................ ................ ................installed installedinstalled installed.... ..compatible ..compatiblecompatible -------------------------------------------------- -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... ............... cpu_adam [YES].............................. [YES]...... [YES] [YES] [OKAY]...... ...... ......[OKAY] [OKAY][OKAY] fused_adamfused_adam fused_adam.............fused_adam............. [YES] ............. .............[YES]...... [YES]......[OKAY][YES] ......[OKAY]...... fused_lamb [OKAY] [OKAY] .............fused_lamb fused_lamb[YES]............. fused_lamb...... [YES] ...................[OKAY]............. [OKAY][YES] [YES]...... ......[OKAY] [OKAY] sparse_attn ............sparse_attn [NO] ................... sparse_attn [NO]sparse_attn [OKAY] ............................... transformer[NO][NO][OKAY] ....... ............ ....... [OKAY] [YES]transformer [OKAY] ......transformer transformer [OKAY] ............ ............ ............ [YES] [YES] stochastic_transformer [YES]............ .......[OKAY][OKAY] [YES] [OKAY] stochastic_transformer...... stochastic_transformer.[OKAY] stochastic_transformer[YES] ........ [OKAY] [YES][YES] ............ [OKAY][OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop nameop name op name................................................ ................ installedinstalledinstalled installed .. .... ..compatiblecompatible compatible --------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adamcpu_adamcpu_adam ................................................... [OKAY][YES][YES][YES] ...... ...... ...... [OKAY] [OKAY][OKAY]fused_adam ............. [YES] ...... [OKAY] fused_adamfused_adamfused_adamfused_lamb ....................................... ............. [YES][YES][YES][YES] ........................ [OKAY][OKAY][OKAY][OKAY] fused_lambfused_lamb fused_lamb.......................... .............[YES] [YES] [YES] ...... sparse_attn...... ...... [OKAY][OKAY] ............[OKAY] [NO] ....... [OKAY] transformer ............ [YES] sparse_attn...... sparse_attn ............ [OKAY]sparse_attn............ [NO] ............ [NO] .......stochastic_transformer[NO] [OKAY]........ ....... [OKAY][YES][OKAY] transformer .................. transformer[OKAY]transformer[YES] .................. ............[YES][OKAY] ......[YES] [OKAY]......stochastic_transformer [OKAY]. [YES]stochastic_transformer stochastic_transformer....... .[YES][OKAY] [YES]...... ......[OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninjaninja .................................... [OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. compatibleinstalled ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY]cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... fused_adam[OKAY] ............. [YES]fused_lamb ................... [OKAY][YES] ...... [OKAY]fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] ............ [NO]transformer ................... [OKAY][YES] ...... [OKAY] transformer ............ [YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY]-------------------------------------------------- DeepSpeed C++/CUDA extension op report stochastic_transformer-------------------------------------------------- .NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [YES] --------------------------------------------------...... JIT compiled ops requires ninja[OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [NO] [YES]....... ......[OKAY] [OKAY] quantizer utils.............. ..................[NO] [YES]....... ......[OKAY] [OKAY] quantizer-------------------------------------------------- .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] .......transformer_inference [OKAY].. [NO] ....... [OKAY]utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_iotransformer_inference .. [NO] ...................... [OKAY][NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer transformer_inference.............. [NO] ....... [OKAY] .. [NO] -------------------------------------------------- ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... transformer_inference[NO] .. [NO] ....... [OKAY] utils transformer_inference.................. ..[YES] [NO]...... .......[OKAY] [OKAY] quantizer .............. [NO]utils ......................... [OKAY][YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 DeepSpeed general environment info: nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................................ ................ installed installedinstalled installed .. ...... compatiblecompatiblecompatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............................................. [YES][YES][YES]............... ..................[YES] [OKAY][OKAY][OKAY]...... [OKAY] fused_adamfused_adam fused_adam.......................... fused_adam.............[YES][YES] ............. [YES]...... ...... [YES] ......[OKAY] [OKAY] ...... [OKAY] [OKAY] fused_lambfused_lamb fused_lamb.......................... fused_lamb .............[YES] [YES] ............. [YES]...... ...... [YES] ......[OKAY] [OKAY] ...... [OKAY] [OKAY] sparse_attnsparse_attnsparse_attn sparse_attn ............ ............ ........................ [NO] [NO][NO][NO] ..................... ....... [OKAY] [OKAY][OKAY] [OKAY] transformertransformertransformer transformer ............ ........................ ............ [YES][YES][YES][YES] ........................ [OKAY][OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformerstochastic_transformerstochastic_transformer .... [YES][YES][YES][YES] ........................ [OKAY][OKAY][OKAY][OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [NO] transformer_inference .. [NO]async_io ...................... [OKAY][NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizertransformer_inference ................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] quantizer .............. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found... [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io ............... quantizer[NO] ..................... [NO][NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** ninjaninjaninjaninja .................................... ..................[OKAY]..................[OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name ................installed................................ installed ..installedinstalled.. ..compatible ..compatible -------------------------------------------------- compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adamcpu_adam............... ...... [YES]............... ............... [OKAY]...... [YES][YES][OKAY] ............ fused_adam[OKAY][OKAY] ............. [YES] fused_adam...... .............fused_adam [OKAY]fused_adam [YES] ................... fused_lamb.............[YES] .............[OKAY][YES]...... [YES]......[OKAY] fused_lamb[OKAY]...... fused_lamb ............. [OKAY] ............. [YES]fused_lamb [YES]................... ...... [OKAY] [YES] [OKAY] ...... [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer sparse_attn............ sparse_attn............[YES] sparse_attn ............[NO].................. [NO].......[OKAY] [NO]....... [OKAY] .......[OKAY]stochastic_transformer transformer [OKAY] ............. transformer[YES][YES]transformer .................................... [YES][YES][OKAY] [OKAY] ............ [OKAY][OKAY] stochastic_transformer . [YES]stochastic_transformer stochastic_transformer...... ..[OKAY] [YES][YES] ............ [OKAY][OKAY] **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.2097147.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ...  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io: please install the libaio-devel package with yumasync_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY] async_io utils............... ..................[NO] [YES]....... ......[NO] [OKAY] quantizer .............. [NO] ....... [OKAY] transformer_inference .. [NO]-------------------------------------------------- ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.2 torch cuda version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']............... 11.1 torch versionnvcc version ......................................... 1.8.211.2 deepspeed install pathtorch cuda version .......................... 11.1['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] nvcc versiondeepspeed info ........................................ 11.20.5.5+58a8e13, 58a8e13, master deepspeed install pathdeepspeed wheel compiled w. ................. torch 1.8, cuda 11.1['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']1.8.2 torch cuda versiontorch version ................................... 11.11.8.2 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... ..................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']11.2 deepspeed infodeepspeed install path .............................. 0.5.5+58a8e13, 58a8e13, master ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... torch install path1.8.2 ............... torch cuda version ............... 11.1 nvcc version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']..................... 11.2 torch versiondeepspeed install path ............................... 1.8.2 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] torch cuda versiondeepspeed info .................................. 11.10.5.5+58a8e13, 58a8e13, master nvcc version ..................... deepspeed wheel compiled w.11.2 ......deepspeed install path torch 1.8, cuda 11.1........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > setting tensorboard ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-21 15:26:34,533] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.336 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.270 seconds time to initialize megatron (seconds): 50.182 [after megatron is initialized] datetime: 2021-11-21 15:26:58 building GPT model ... [2021-11-21 15:26:58,225] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-21 15:26:58,226] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-21 15:26:58,226] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.75 GB, percent = 20.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-21 15:26:59,524] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-21 15:27:00,081] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-21 15:27:00,081] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-21 15:27:00,082] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.98 GB, percent = 20.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-21 15:27:00,101] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-21 15:27:00,405] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-21 15:27:00,405] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-21 15:27:00,405] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-21 15:27:00,411] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-21 15:27:00,411] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-21 15:27:00,411] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-21 15:27:00,411] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-21 15:27:00,411] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-21 15:27:00,411] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-21 15:27:00,411] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] [2021-11-21 15:27:02,059] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-21 15:27:02,059] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-21 15:27:02,060] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.75 GB, percent = 21.8% [2021-11-21 15:27:02,091] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-21 15:27:02,092] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-21 15:27:02,092] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.86 GB, percent = 21.8% [2021-11-21 15:27:02,092] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-21 15:27:02,119] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-21 15:27:02,120] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-21 15:27:02,120] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.93 GB, percent = 21.9% [2021-11-21 15:27:02,120] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-21 15:27:02,120] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-21 15:27:02,120] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-21 15:27:02,120] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-21 15:27:02,121] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] amp_params ................... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] dump_state ................... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-21 15:27:02,121] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] pld_params ................... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-21 15:27:02,122] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-21 15:27:02,123] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-21 15:27:02,123] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-21 15:27:02,123] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-21 15:27:02,123] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-21 15:27:02,123] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-21 15:27:02,123] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-21 15:27:02,152] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-21 15:27:02,152] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-21 15:27:02,184] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,184] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,184] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,184] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,185] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. WARNING: could not find the metadata file /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints will not load any checkpoints and will start from random [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. [2021-11-21 15:27:02,186] [WARNING] [engine.py:2025:load_checkpoint] Unable to find latest file at /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/latest, if trying to load latest checkpoint please ensure this file exists or pass an explicit checkpoint tag when loading a checkpoint. time (ms) | load-checkpoint: 2.97 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-21 15:27:02 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.078274 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.183 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.186 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.070 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-21 15:27:14 done with setup ... training ... time (ms) | model-and-optimizer-setup: 4001.49 | train/valid/test-data-iterators-setup: 12611.89 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.208598528 billion Number of parameters without embeddings: 1.20860672 billion [before the start of training step] datetime: 2021-11-21 15:27:14 [2021-11-21 15:27:14,868] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-21 15:27:14,868] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-21 15:27:14,868] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-21 15:27:14,868] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-21 15:27:14,868] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 0] (after 200 iterations) memory (MB) | allocated: 1623.6025390625 | max allocated: 3921.2119140625 | reserved: 6536.0 | max reserved: 6536.0 [Rank 32] (after 200 iterations) memory (MB) | allocated: 2032.60498046875 | max allocated: 4315.22216796875 | reserved: 6912.0 | max reserved: 6912.0 iteration 200/ 152972 | consumed samples: 6400 | consumed tokens: 13107200 | elapsed time per iteration (ms): 1223.5 | learning rate: 6.991E-06 | global batch size: 32 | lm loss: 8.000997E+00 | loss scale: 4096.0 | grad norm: 8557.358 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 400/ 152972 | consumed samples: 12800 | consumed tokens: 26214400 | elapsed time per iteration (ms): 1207.3 | learning rate: 1.398E-05 | global batch size: 32 | lm loss: 7.550038E+00 | loss scale: 4096.0 | grad norm: 7682.559 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 600/ 152972 | consumed samples: 19200 | consumed tokens: 39321600 | elapsed time per iteration (ms): 1208.1 | learning rate: 2.097E-05 | global batch size: 32 | lm loss: 7.190775E+00 | loss scale: 8192.0 | grad norm: 2846.571 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 800/ 152972 | consumed samples: 25600 | consumed tokens: 52428800 | elapsed time per iteration (ms): 1205.1 | learning rate: 2.796E-05 | global batch size: 32 | lm loss: 6.859697E+00 | loss scale: 8192.0 | grad norm: 302.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 1000/ 152972 | consumed samples: 32000 | consumed tokens: 65536000 | elapsed time per iteration (ms): 1201.8 | learning rate: 3.492E-05 | global batch size: 32 | lm loss: 6.291536E+00 | loss scale: 8192.0 | grad norm: 4123.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 1000 | lm loss value: 6.374531E+00 | lm loss PPL: 5.867104E+02 | ------------------------------------------------------------------------------------------ iteration 1200/ 152972 | consumed samples: 38400 | consumed tokens: 78643200 | elapsed time per iteration (ms): 1274.2 | learning rate: 4.187E-05 | global batch size: 32 | lm loss: 6.006076E+00 | loss scale: 4096.0 | grad norm: 8403.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 1400/ 152972 | consumed samples: 44800 | consumed tokens: 91750400 | elapsed time per iteration (ms): 1205.5 | learning rate: 4.886E-05 | global batch size: 32 | lm loss: 6.228642E+00 | loss scale: 4096.0 | grad norm: 111.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 15:57:42,695] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/mp_rank_00_model_states.pt [2021-11-21 15:57:43,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,132] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,133] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,148] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,167] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,185] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,195] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,213] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,221] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,221] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 15:57:43,225] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 15:57:43,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step1500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 1500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2806.02 iteration 1600/ 152972 | consumed samples: 51200 | consumed tokens: 104857600 | elapsed time per iteration (ms): 1217.7 | learning rate: 5.585E-05 | global batch size: 32 | lm loss: 6.641904E+00 | loss scale: 4096.0 | grad norm: 5419.811 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 1800/ 152972 | consumed samples: 57600 | consumed tokens: 117964800 | elapsed time per iteration (ms): 1208.6 | learning rate: 6.284E-05 | global batch size: 32 | lm loss: 5.691072E+00 | loss scale: 8192.0 | grad norm: 466.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 16:07:49,018] [INFO] [logging.py:68:log_dist] [Rank 0] step=2000, skipped=2, lr=[6.983534037847136e-05, 6.983534037847136e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 2000 loss: 0.2402 iter time (s): 0.001 samples/sec: 53550.719 iteration 2000/ 152972 | consumed samples: 64000 | consumed tokens: 131072000 | elapsed time per iteration (ms): 1220.0 | learning rate: 6.984E-05 | global batch size: 32 | lm loss: 6.077307E+00 | loss scale: 8192.0 | grad norm: 443.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 2000 | lm loss value: 5.686119E+00 | lm loss PPL: 2.947475E+02 | ------------------------------------------------------------------------------------------ iteration 2200/ 152972 | consumed samples: 70400 | consumed tokens: 144179200 | elapsed time per iteration (ms): 1287.5 | learning rate: 7.683E-05 | global batch size: 32 | lm loss: 5.696931E+00 | loss scale: 16384.0 | grad norm: 2455.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 2400/ 152972 | consumed samples: 76800 | consumed tokens: 157286400 | elapsed time per iteration (ms): 1230.5 | learning rate: 8.382E-05 | global batch size: 32 | lm loss: 5.189986E+00 | loss scale: 16384.0 | grad norm: 24346.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 2600/ 152972 | consumed samples: 83200 | consumed tokens: 170393600 | elapsed time per iteration (ms): 1229.1 | learning rate: 9.081E-05 | global batch size: 32 | lm loss: 4.307329E+00 | loss scale: 16384.0 | grad norm: 24643.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 2800/ 152972 | consumed samples: 89600 | consumed tokens: 183500800 | elapsed time per iteration (ms): 1206.9 | learning rate: 9.776E-05 | global batch size: 32 | lm loss: 3.807464E+00 | loss scale: 32768.0 | grad norm: 91479.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3000/ 152972 | consumed samples: 96000 | consumed tokens: 196608000 | elapsed time per iteration (ms): 1224.5 | learning rate: 1.047E-04 | global batch size: 32 | lm loss: 3.095487E+00 | loss scale: 16384.0 | grad norm: 13177.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 3000 | lm loss value: 3.027255E+00 | lm loss PPL: 2.064050E+01 | ------------------------------------------------------------------------------------------ saving checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 16:28:40,415] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/mp_rank_00_model_states.pt [2021-11-21 16:28:40,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,881] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,888] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,889] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,889] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,902] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 16:28:40,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 16:28:40,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step3000/zero_pp_rank_26_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 3000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2646.36 iteration 3200/ 152972 | consumed samples: 102400 | consumed tokens: 209715200 | elapsed time per iteration (ms): 1289.2 | learning rate: 1.117E-04 | global batch size: 32 | lm loss: 2.745905E+00 | loss scale: 16384.0 | grad norm: 4305.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3400/ 152972 | consumed samples: 108800 | consumed tokens: 222822400 | elapsed time per iteration (ms): 1205.2 | learning rate: 1.187E-04 | global batch size: 32 | lm loss: 3.028870E+00 | loss scale: 16384.0 | grad norm: 1016.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3600/ 152972 | consumed samples: 115200 | consumed tokens: 235929600 | elapsed time per iteration (ms): 1217.4 | learning rate: 1.257E-04 | global batch size: 32 | lm loss: 2.755419E+00 | loss scale: 32768.0 | grad norm: 1725.461 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 3800/ 152972 | consumed samples: 121600 | consumed tokens: 249036800 | elapsed time per iteration (ms): 1229.5 | learning rate: 1.326E-04 | global batch size: 32 | lm loss: 3.037871E+00 | loss scale: 32768.0 | grad norm: 3261.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 16:48:56,098] [INFO] [logging.py:68:log_dist] [Rank 0] step=4000, skipped=5, lr=[0.0001396357281341307, 0.0001396357281341307], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 4000 loss: 0.7130 iter time (s): 0.001 samples/sec: 53376.907 iteration 4000/ 152972 | consumed samples: 128000 | consumed tokens: 262144000 | elapsed time per iteration (ms): 1215.6 | learning rate: 1.396E-04 | global batch size: 32 | lm loss: 2.708987E+00 | loss scale: 65536.0 | grad norm: 8811.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 4000 | lm loss value: 2.719706E+00 | lm loss PPL: 1.517586E+01 | ------------------------------------------------------------------------------------------ iteration 4200/ 152972 | consumed samples: 135456 | consumed tokens: 277413888 | elapsed time per iteration (ms): 1303.0 | learning rate: 1.476E-04 | global batch size: 64 | lm loss: 2.699131E+00 | loss scale: 32768.0 | grad norm: 9476.396 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 4400/ 152972 | consumed samples: 148256 | consumed tokens: 303628288 | elapsed time per iteration (ms): 1450.8 | learning rate: 1.616E-04 | global batch size: 64 | lm loss: 2.488977E+00 | loss scale: 16384.0 | grad norm: 2218.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 17:00:33,734] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/mp_rank_00_model_states.pt [2021-11-21 17:00:34,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,195] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,195] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 17:00:34,213] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 17:00:34,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step4500/zero_pp_rank_30_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 4500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2651.36 iteration 4600/ 152972 | consumed samples: 161056 | consumed tokens: 329842688 | elapsed time per iteration (ms): 1465.5 | learning rate: 1.755E-04 | global batch size: 64 | lm loss: 2.752822E+00 | loss scale: 16384.0 | grad norm: 3236.155 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 4800/ 152972 | consumed samples: 173856 | consumed tokens: 356057088 | elapsed time per iteration (ms): 1464.6 | learning rate: 1.895E-04 | global batch size: 64 | lm loss: 2.600099E+00 | loss scale: 32768.0 | grad norm: 4163.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5000/ 152972 | consumed samples: 186656 | consumed tokens: 382271488 | elapsed time per iteration (ms): 1450.3 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.671476E+00 | loss scale: 32768.0 | grad norm: 4012.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 5000 | lm loss value: 2.752070E+00 | lm loss PPL: 1.567505E+01 | ------------------------------------------------------------------------------------------ iteration 5200/ 152972 | consumed samples: 199456 | consumed tokens: 408485888 | elapsed time per iteration (ms): 1556.9 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.623164E+00 | loss scale: 8192.0 | grad norm: 2821.551 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5400/ 152972 | consumed samples: 212256 | consumed tokens: 434700288 | elapsed time per iteration (ms): 1449.5 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.625780E+00 | loss scale: 8192.0 | grad norm: 1613.494 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5600/ 152972 | consumed samples: 225056 | consumed tokens: 460914688 | elapsed time per iteration (ms): 1443.6 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.817862E+00 | loss scale: 8192.0 | grad norm: 7511.009 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 5800/ 152972 | consumed samples: 237856 | consumed tokens: 487129088 | elapsed time per iteration (ms): 1431.1 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.206845E+00 | loss scale: 16384.0 | grad norm: 858.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 17:37:03,803] [INFO] [logging.py:68:log_dist] [Rank 0] step=6000, skipped=12, lr=[0.0001999996064001037, 0.0001999996064001037], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 6000 loss: 4.4555 iter time (s): 0.001 samples/sec: 86319.980 iteration 6000/ 152972 | consumed samples: 250656 | consumed tokens: 513343488 | elapsed time per iteration (ms): 1423.3 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.448138E+00 | loss scale: 16384.0 | grad norm: 8444.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 6000 | lm loss value: 2.334224E+00 | lm loss PPL: 1.032145E+01 | ------------------------------------------------------------------------------------------ saving checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 17:37:27,501] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/mp_rank_00_model_states.pt [2021-11-21 17:37:27,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,967] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,967] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,971] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 17:37:27,985] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 17:37:27,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step6000/zero_pp_rank_6_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 6000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2698.61 iteration 6200/ 152972 | consumed samples: 263456 | consumed tokens: 539557888 | elapsed time per iteration (ms): 1535.9 | learning rate: 2.000E-04 | global batch size: 64 | lm loss: 2.498862E+00 | loss scale: 16384.0 | grad norm: 12579.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 6400/ 152972 | consumed samples: 281024 | consumed tokens: 575537152 | elapsed time per iteration (ms): 1613.2 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.385746E+00 | loss scale: 8192.0 | grad norm: 1476.734 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 6600/ 152972 | consumed samples: 300224 | consumed tokens: 614858752 | elapsed time per iteration (ms): 1672.4 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.369125E+00 | loss scale: 8192.0 | grad norm: 2724.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 6800/ 152972 | consumed samples: 319424 | consumed tokens: 654180352 | elapsed time per iteration (ms): 1653.9 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.221761E+00 | loss scale: 8192.0 | grad norm: 3239.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 7000/ 152972 | consumed samples: 338624 | consumed tokens: 693501952 | elapsed time per iteration (ms): 1661.0 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.449012E+00 | loss scale: 16384.0 | grad norm: 4136.806 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 7000 | lm loss value: 2.129992E+00 | lm loss PPL: 8.414804E+00 | ------------------------------------------------------------------------------------------ iteration 7200/ 152972 | consumed samples: 357824 | consumed tokens: 732823552 | elapsed time per iteration (ms): 1795.6 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.253619E+00 | loss scale: 16384.0 | grad norm: 788.750 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 7400/ 152972 | consumed samples: 377024 | consumed tokens: 772145152 | elapsed time per iteration (ms): 1642.7 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.236107E+00 | loss scale: 32768.0 | grad norm: 10514.678 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 18:18:26,563] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/mp_rank_00_model_states.pt [2021-11-21 18:18:26,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 18:18:26,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 18:18:26,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,994] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,994] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,994] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 18:18:26,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,997] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,997] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 18:18:26,998] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 18:18:26,998] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,000] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,007] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,010] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,023] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,030] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,033] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,035] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,035] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,042] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 18:18:27,049] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 18:18:27,052] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step7500/zero_pp_rank_14_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 7500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2671.07 iteration 7600/ 152972 | consumed samples: 396224 | consumed tokens: 811466752 | elapsed time per iteration (ms): 1680.2 | learning rate: 2.000E-04 | global batch size: 96 | lm loss: 2.076308E+00 | loss scale: 32768.0 | grad norm: 4505.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 7800/ 152972 | consumed samples: 420544 | consumed tokens: 861274112 | elapsed time per iteration (ms): 1865.6 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.263915E+00 | loss scale: 32768.0 | grad norm: 7443.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 18:33:52,363] [INFO] [logging.py:68:log_dist] [Rank 0] step=8000, skipped=13, lr=[0.00019999395560550484, 0.00019999395560550484], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 8000 loss: 3.3672 iter time (s): 0.001 samples/sec: 133214.093 iteration 8000/ 152972 | consumed samples: 446144 | consumed tokens: 913702912 | elapsed time per iteration (ms): 1922.1 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.069185E+00 | loss scale: 65536.0 | grad norm: 14377.713 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 8000 | lm loss value: 2.179860E+00 | lm loss PPL: 8.845069E+00 | ------------------------------------------------------------------------------------------ iteration 8200/ 152972 | consumed samples: 471744 | consumed tokens: 966131712 | elapsed time per iteration (ms): 2099.9 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.236168E+00 | loss scale: 32768.0 | grad norm: 8619.164 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 8400/ 152972 | consumed samples: 497344 | consumed tokens: 1018560512 | elapsed time per iteration (ms): 1926.9 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 1.936707E+00 | loss scale: 32768.0 | grad norm: 6752.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 8600/ 152972 | consumed samples: 522944 | consumed tokens: 1070989312 | elapsed time per iteration (ms): 1915.2 | learning rate: 2.000E-04 | global batch size: 128 | lm loss: 2.194406E+00 | loss scale: 16384.0 | grad norm: 3473.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 8800/ 152972 | consumed samples: 552320 | consumed tokens: 1131151360 | elapsed time per iteration (ms): 2059.6 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.153580E+00 | loss scale: 8192.0 | grad norm: 2528.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9000/ 152972 | consumed samples: 584320 | consumed tokens: 1196687360 | elapsed time per iteration (ms): 2126.0 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 1.923214E+00 | loss scale: 8192.0 | grad norm: 1592.727 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------ valid loss at iteration 9000 | lm loss value: 2.025576E+00 | lm loss PPL: 7.580475E+00 | ------------------------------------------------------------------------------------------ saving checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 19:08:19,092] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/mp_rank_00_model_states.pt [2021-11-21 19:08:19,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,509] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,544] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 19:08:19,592] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 19:08:19,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step9000/zero_pp_rank_7_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 9000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2732.92 iteration 9200/ 152972 | consumed samples: 616320 | consumed tokens: 1262223360 | elapsed time per iteration (ms): 2314.3 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.129140E+00 | loss scale: 16384.0 | grad norm: 1983.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9400/ 152972 | consumed samples: 648320 | consumed tokens: 1327759360 | elapsed time per iteration (ms): 2096.1 | learning rate: 2.000E-04 | global batch size: 160 | lm loss: 2.040066E+00 | loss scale: 16384.0 | grad norm: 2410.204 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9600/ 152972 | consumed samples: 683040 | consumed tokens: 1398865920 | elapsed time per iteration (ms): 2195.4 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 2.148150E+00 | loss scale: 16384.0 | grad norm: 4765.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 9800/ 152972 | consumed samples: 721440 | consumed tokens: 1477509120 | elapsed time per iteration (ms): 2336.6 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 1.905918E+00 | loss scale: 16384.0 | grad norm: 2984.147 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 19:45:12,889] [INFO] [logging.py:68:log_dist] [Rank 0] step=10000, skipped=18, lr=[0.0001999709295138719, 0.0001999709295138719], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 10000 loss: 2.2851 iter time (s): 0.001 samples/sec: 164786.486 iteration 10000/ 152972 | consumed samples: 759840 | consumed tokens: 1556152320 | elapsed time per iteration (ms): 2332.9 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 1.959267E+00 | loss scale: 32768.0 | grad norm: 5758.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 10000 | lm loss value: 1.896366E+00 | lm loss PPL: 6.661640E+00 | ------------------------------------------------------------------------------------------- iteration 10200/ 152972 | consumed samples: 798240 | consumed tokens: 1634795520 | elapsed time per iteration (ms): 2568.3 | learning rate: 2.000E-04 | global batch size: 192 | lm loss: 2.030863E+00 | loss scale: 32768.0 | grad norm: 2827.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 10400/ 152972 | consumed samples: 842720 | consumed tokens: 1725890560 | elapsed time per iteration (ms): 2599.5 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 1.946195E+00 | loss scale: 32768.0 | grad norm: 6204.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 20:06:46,510] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/mp_rank_00_model_states.pt [2021-11-21 20:06:46,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,985] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,985] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,985] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,992] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 20:06:46,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 20:06:46,997] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 20:06:47,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step10500/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 10500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2626.17 iteration 10600/ 152972 | consumed samples: 887520 | consumed tokens: 1817640960 | elapsed time per iteration (ms): 2593.8 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 1.934502E+00 | loss scale: 65536.0 | grad norm: 6969.783 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 10800/ 152972 | consumed samples: 932320 | consumed tokens: 1909391360 | elapsed time per iteration (ms): 2574.2 | learning rate: 2.000E-04 | global batch size: 224 | lm loss: 1.847305E+00 | loss scale: 65536.0 | grad norm: 8012.047 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11000/ 152972 | consumed samples: 983360 | consumed tokens: 2013921280 | elapsed time per iteration (ms): 2810.7 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 1.869316E+00 | loss scale: 65536.0 | grad norm: 7426.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 11000 | lm loss value: 1.995048E+00 | lm loss PPL: 7.352556E+00 | ------------------------------------------------------------------------------------------- iteration 11200/ 152972 | consumed samples: 1034560 | consumed tokens: 2118778880 | elapsed time per iteration (ms): 3128.3 | learning rate: 1.999E-04 | global batch size: 256 | lm loss: 1.912706E+00 | loss scale: 131072.0 | grad norm: 21127.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11400/ 152972 | consumed samples: 1088128 | consumed tokens: 2228486144 | elapsed time per iteration (ms): 2904.4 | learning rate: 1.999E-04 | global batch size: 288 | lm loss: 1.913164E+00 | loss scale: 131072.0 | grad norm: 14915.610 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11600/ 152972 | consumed samples: 1145728 | consumed tokens: 2346450944 | elapsed time per iteration (ms): 3062.8 | learning rate: 1.999E-04 | global batch size: 288 | lm loss: 1.981089E+00 | loss scale: 65536.0 | grad norm: 13602.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 11800/ 152972 | consumed samples: 1203680 | consumed tokens: 2465136640 | elapsed time per iteration (ms): 3078.5 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 1.909631E+00 | loss scale: 32768.0 | grad norm: 5439.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 21:20:42,668] [INFO] [logging.py:68:log_dist] [Rank 0] step=12000, skipped=23, lr=[0.0001998972270000547, 0.0001998972270000547], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 12000 loss: 1.1182 iter time (s): 0.002 samples/sec: 194174.364 iteration 12000/ 152972 | consumed samples: 1267680 | consumed tokens: 2596208640 | elapsed time per iteration (ms): 3328.4 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 1.866228E+00 | loss scale: 16384.0 | grad norm: 1428.191 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 12000 | lm loss value: 1.781525E+00 | lm loss PPL: 5.938909E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 21:21:59,706] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/mp_rank_00_model_states.pt [2021-11-21 21:22:00,125] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,127] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,145] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,148] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,185] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 21:22:00,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 21:22:00,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step12000/zero_pp_rank_10_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 12000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2640.09 iteration 12200/ 152972 | consumed samples: 1331680 | consumed tokens: 2727280640 | elapsed time per iteration (ms): 3698.2 | learning rate: 1.999E-04 | global batch size: 320 | lm loss: 1.876355E+00 | loss scale: 16384.0 | grad norm: 2840.758 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 12400/ 152972 | consumed samples: 1401888 | consumed tokens: 2871066624 | elapsed time per iteration (ms): 3565.8 | learning rate: 1.999E-04 | global batch size: 352 | lm loss: 1.873858E+00 | loss scale: 32768.0 | grad norm: 4040.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 12600/ 152972 | consumed samples: 1472768 | consumed tokens: 3016228864 | elapsed time per iteration (ms): 3612.0 | learning rate: 1.999E-04 | global batch size: 384 | lm loss: 1.863848E+00 | loss scale: 32768.0 | grad norm: 3689.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 12800/ 152972 | consumed samples: 1549568 | consumed tokens: 3173515264 | elapsed time per iteration (ms): 3852.2 | learning rate: 1.998E-04 | global batch size: 384 | lm loss: 1.857054E+00 | loss scale: 32768.0 | grad norm: 3403.161 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 13000/ 152972 | consumed samples: 1628544 | consumed tokens: 3335258112 | elapsed time per iteration (ms): 3920.3 | learning rate: 1.998E-04 | global batch size: 416 | lm loss: 1.856108E+00 | loss scale: 65536.0 | grad norm: 11560.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 13000 | lm loss value: 1.888755E+00 | lm loss PPL: 6.611133E+00 | ------------------------------------------------------------------------------------------- iteration 13200/ 152972 | consumed samples: 1711744 | consumed tokens: 3505651712 | elapsed time per iteration (ms): 4591.4 | learning rate: 1.998E-04 | global batch size: 416 | lm loss: 1.895724E+00 | loss scale: 32768.0 | grad norm: 4661.899 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 13400/ 152972 | consumed samples: 1799680 | consumed tokens: 3685744640 | elapsed time per iteration (ms): 4264.6 | learning rate: 1.998E-04 | global batch size: 448 | lm loss: 1.818847E+00 | loss scale: 32768.0 | grad norm: 3644.954 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 13500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-21 22:59:37,923] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/mp_rank_00_model_states.pt [2021-11-21 22:59:38,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,351] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,366] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,366] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,376] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-21 22:59:38,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-21 22:59:38,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step13500/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 13500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2648.28 iteration 13600/ 152972 | consumed samples: 1890880 | consumed tokens: 3872522240 | elapsed time per iteration (ms): 4389.5 | learning rate: 1.997E-04 | global batch size: 480 | lm loss: 1.901426E+00 | loss scale: 65536.0 | grad norm: 49729.580 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 13800/ 152972 | consumed samples: 1986880 | consumed tokens: 4069130240 | elapsed time per iteration (ms): 4554.2 | learning rate: 1.997E-04 | global batch size: 480 | lm loss: 1.868949E+00 | loss scale: 65536.0 | grad norm: 8049.069 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-21 23:38:03,749] [INFO] [logging.py:68:log_dist] [Rank 0] step=14000, skipped=25, lr=[0.00019968253251979363, 0.00019968253251979363], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 14000 loss: 1.7627 iter time (s): 0.002 samples/sec: 205015.216 iteration 14000/ 152972 | consumed samples: 2088384 | consumed tokens: 4277010432 | elapsed time per iteration (ms): 4757.2 | learning rate: 1.997E-04 | global batch size: 512 | lm loss: 1.789393E+00 | loss scale: 65536.0 | grad norm: 7917.298 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 14000 | lm loss value: 1.824114E+00 | lm loss PPL: 6.197300E+00 | ------------------------------------------------------------------------------------------- iteration 14200/ 152972 | consumed samples: 2190784 | consumed tokens: 4486725632 | elapsed time per iteration (ms): 5385.4 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 1.759017E+00 | loss scale: 65536.0 | grad norm: 6786.384 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 14400/ 152972 | consumed samples: 2293184 | consumed tokens: 4696440832 | elapsed time per iteration (ms): 4770.7 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 1.793806E+00 | loss scale: 32768.0 | grad norm: 4531.498 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 14600/ 152972 | consumed samples: 2395584 | consumed tokens: 4906156032 | elapsed time per iteration (ms): 4769.2 | learning rate: 1.996E-04 | global batch size: 512 | lm loss: 1.784717E+00 | loss scale: 32768.0 | grad norm: 4486.444 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 14800/ 152972 | consumed samples: 2497984 | consumed tokens: 5115871232 | elapsed time per iteration (ms): 4760.8 | learning rate: 1.995E-04 | global batch size: 512 | lm loss: 1.795697E+00 | loss scale: 65536.0 | grad norm: 4467.986 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15000/ 152972 | consumed samples: 2600384 | consumed tokens: 5325586432 | elapsed time per iteration (ms): 4760.4 | learning rate: 1.995E-04 | global batch size: 512 | lm loss: 1.758073E+00 | loss scale: 65536.0 | grad norm: 10565.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 15000 | lm loss value: 1.714850E+00 | lm loss PPL: 5.555842E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 15000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 01:01:32,538] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/mp_rank_00_model_states.pt [2021-11-22 01:01:32,985] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,993] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,994] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,997] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,997] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,999] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 01:01:32,999] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 01:01:32,999] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,002] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,011] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,022] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,023] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,027] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,028] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,030] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,033] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,044] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,046] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 01:01:33,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,059] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 01:01:33,066] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step15000/zero_pp_rank_23_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 15000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2705.06 iteration 15200/ 152972 | consumed samples: 2702784 | consumed tokens: 5535301632 | elapsed time per iteration (ms): 5363.9 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 1.796760E+00 | loss scale: 65536.0 | grad norm: 5562.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15400/ 152972 | consumed samples: 2805184 | consumed tokens: 5745016832 | elapsed time per iteration (ms): 4775.8 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 1.700759E+00 | loss scale: 131072.0 | grad norm: 13814.524 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15600/ 152972 | consumed samples: 2907584 | consumed tokens: 5954732032 | elapsed time per iteration (ms): 4772.6 | learning rate: 1.994E-04 | global batch size: 512 | lm loss: 1.709494E+00 | loss scale: 131072.0 | grad norm: 16734.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 15800/ 152972 | consumed samples: 3009984 | consumed tokens: 6164447232 | elapsed time per iteration (ms): 4792.1 | learning rate: 1.993E-04 | global batch size: 512 | lm loss: 1.743433E+00 | loss scale: 262144.0 | grad norm: 35466.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 02:21:09,673] [INFO] [logging.py:68:log_dist] [Rank 0] step=16000, skipped=28, lr=[0.00019924996055073444, 0.00019924996055073444], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 16000 loss: 1.3945 iter time (s): 0.002 samples/sec: 214805.965 iteration 16000/ 152972 | consumed samples: 3112384 | consumed tokens: 6374162432 | elapsed time per iteration (ms): 4778.7 | learning rate: 1.992E-04 | global batch size: 512 | lm loss: 1.708568E+00 | loss scale: 262144.0 | grad norm: 24855.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 16000 | lm loss value: 1.773496E+00 | lm loss PPL: 5.891413E+00 | ------------------------------------------------------------------------------------------- iteration 16200/ 152972 | consumed samples: 3214784 | consumed tokens: 6583877632 | elapsed time per iteration (ms): 5407.6 | learning rate: 1.992E-04 | global batch size: 512 | lm loss: 1.770365E+00 | loss scale: 262144.0 | grad norm: 37669.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 16400/ 152972 | consumed samples: 3317184 | consumed tokens: 6793592832 | elapsed time per iteration (ms): 4806.2 | learning rate: 1.991E-04 | global batch size: 512 | lm loss: 1.778807E+00 | loss scale: 262144.0 | grad norm: 35223.145 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 03:03:14,982] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/mp_rank_00_model_states.pt [2021-11-22 03:03:15,422] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,444] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,507] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 03:03:15,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 03:03:15,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step16500/zero_pp_rank_10_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 16500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 3004.25 iteration 16600/ 152972 | consumed samples: 3419584 | consumed tokens: 7003308032 | elapsed time per iteration (ms): 4815.0 | learning rate: 1.991E-04 | global batch size: 512 | lm loss: 1.797828E+00 | loss scale: 262144.0 | grad norm: 33866.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 16800/ 152972 | consumed samples: 3521984 | consumed tokens: 7213023232 | elapsed time per iteration (ms): 4782.7 | learning rate: 1.990E-04 | global batch size: 512 | lm loss: 1.768616E+00 | loss scale: 131072.0 | grad norm: 14182.941 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17000/ 152972 | consumed samples: 3624384 | consumed tokens: 7422738432 | elapsed time per iteration (ms): 4773.0 | learning rate: 1.990E-04 | global batch size: 512 | lm loss: 1.753261E+00 | loss scale: 131072.0 | grad norm: 11407.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 17000 | lm loss value: 1.708653E+00 | lm loss PPL: 5.521517E+00 | ------------------------------------------------------------------------------------------- iteration 17200/ 152972 | consumed samples: 3726784 | consumed tokens: 7632453632 | elapsed time per iteration (ms): 5360.7 | learning rate: 1.989E-04 | global batch size: 512 | lm loss: 1.636814E+00 | loss scale: 65536.0 | grad norm: 6671.819 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17400/ 152972 | consumed samples: 3829184 | consumed tokens: 7842168832 | elapsed time per iteration (ms): 4769.1 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 1.717898E+00 | loss scale: 65536.0 | grad norm: 8626.681 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17600/ 152972 | consumed samples: 3931584 | consumed tokens: 8051884032 | elapsed time per iteration (ms): 4773.4 | learning rate: 1.988E-04 | global batch size: 512 | lm loss: 1.789216E+00 | loss scale: 131072.0 | grad norm: 12195.196 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 17800/ 152972 | consumed samples: 4033984 | consumed tokens: 8261599232 | elapsed time per iteration (ms): 4792.7 | learning rate: 1.987E-04 | global batch size: 512 | lm loss: 1.705398E+00 | loss scale: 131072.0 | grad norm: 11856.830 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 05:04:42,469] [INFO] [logging.py:68:log_dist] [Rank 0] step=18000, skipped=32, lr=[0.00019863557947544174, 0.00019863557947544174], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 18000/ 152972 | consumed samples: 4136384 | consumed tokens: 8471314432 | elapsed time per iteration (ms): 4783.6 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 1.746141E+00 | loss scale: 131072.0 | grad norm: 11594.223 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 18000 loss: 1.3095 iter time (s): 0.002 samples/sec: 215721.268 ------------------------------------------------------------------------------------------- valid loss at iteration 18000 | lm loss value: 1.729535E+00 | lm loss PPL: 5.638032E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 05:06:43,138] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/mp_rank_00_model_states.pt [2021-11-22 05:06:43,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 05:06:43,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 05:06:43,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step18000/zero_pp_rank_30_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 18000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2651.83 iteration 18200/ 152972 | consumed samples: 4238784 | consumed tokens: 8681029632 | elapsed time per iteration (ms): 5383.4 | learning rate: 1.986E-04 | global batch size: 512 | lm loss: 1.710389E+00 | loss scale: 262144.0 | grad norm: 30990.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 18400/ 152972 | consumed samples: 4341184 | consumed tokens: 8890744832 | elapsed time per iteration (ms): 4778.1 | learning rate: 1.985E-04 | global batch size: 512 | lm loss: 1.694784E+00 | loss scale: 262144.0 | grad norm: 37524.284 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 18600/ 152972 | consumed samples: 4443584 | consumed tokens: 9100460032 | elapsed time per iteration (ms): 4788.2 | learning rate: 1.984E-04 | global batch size: 512 | lm loss: 1.743873E+00 | loss scale: 262144.0 | grad norm: 26276.740 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 18800/ 152972 | consumed samples: 4545984 | consumed tokens: 9310175232 | elapsed time per iteration (ms): 4780.9 | learning rate: 1.983E-04 | global batch size: 512 | lm loss: 1.737783E+00 | loss scale: 131072.0 | grad norm: 12133.506 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 19000/ 152972 | consumed samples: 4648384 | consumed tokens: 9519890432 | elapsed time per iteration (ms): 4764.8 | learning rate: 1.983E-04 | global batch size: 512 | lm loss: 1.687130E+00 | loss scale: 16384.0 | grad norm: 1678.840 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 19000 | lm loss value: 1.626686E+00 | lm loss PPL: 5.086987E+00 | ------------------------------------------------------------------------------------------- iteration 19200/ 152972 | consumed samples: 4750784 | consumed tokens: 9729605632 | elapsed time per iteration (ms): 5367.9 | learning rate: 1.982E-04 | global batch size: 512 | lm loss: 1.671367E+00 | loss scale: 16384.0 | grad norm: 1356.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 19400/ 152972 | consumed samples: 4853184 | consumed tokens: 9939320832 | elapsed time per iteration (ms): 4775.6 | learning rate: 1.981E-04 | global batch size: 512 | lm loss: 1.647027E+00 | loss scale: 32768.0 | grad norm: 4899.113 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 19500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 07:08:12,768] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/mp_rank_00_model_states.pt [2021-11-22 07:08:13,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,228] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,228] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,245] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 07:08:13,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 07:08:13,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step19500/zero_pp_rank_27_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 19500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2595.49 iteration 19600/ 152972 | consumed samples: 4955584 | consumed tokens: 10149036032 | elapsed time per iteration (ms): 4807.6 | learning rate: 1.980E-04 | global batch size: 512 | lm loss: 1.702057E+00 | loss scale: 32768.0 | grad norm: 2942.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 19800/ 152972 | consumed samples: 5057984 | consumed tokens: 10358751232 | elapsed time per iteration (ms): 4778.3 | learning rate: 1.979E-04 | global batch size: 512 | lm loss: 1.687568E+00 | loss scale: 32768.0 | grad norm: 3768.334 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 07:48:04,145] [INFO] [logging.py:68:log_dist] [Rank 0] step=20000, skipped=38, lr=[0.00019784129145691303, 0.00019784129145691303], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 20000 loss: 0.7449 iter time (s): 0.002 samples/sec: 215028.816 iteration 20000/ 152972 | consumed samples: 5160384 | consumed tokens: 10568466432 | elapsed time per iteration (ms): 4783.8 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 1.706155E+00 | loss scale: 65536.0 | grad norm: 3962.539 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 20000 | lm loss value: 1.671937E+00 | lm loss PPL: 5.322469E+00 | ------------------------------------------------------------------------------------------- iteration 20200/ 152972 | consumed samples: 5262784 | consumed tokens: 10778181632 | elapsed time per iteration (ms): 5385.4 | learning rate: 1.978E-04 | global batch size: 512 | lm loss: 1.708156E+00 | loss scale: 65536.0 | grad norm: 27481.523 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 20400/ 152972 | consumed samples: 5365184 | consumed tokens: 10987896832 | elapsed time per iteration (ms): 4773.5 | learning rate: 1.977E-04 | global batch size: 512 | lm loss: 1.652668E+00 | loss scale: 131072.0 | grad norm: 19858.278 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 20600/ 152972 | consumed samples: 5467584 | consumed tokens: 11197612032 | elapsed time per iteration (ms): 4793.2 | learning rate: 1.976E-04 | global batch size: 512 | lm loss: 1.664459E+00 | loss scale: 131072.0 | grad norm: 10487.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 20800/ 152972 | consumed samples: 5569984 | consumed tokens: 11407327232 | elapsed time per iteration (ms): 4792.8 | learning rate: 1.975E-04 | global batch size: 512 | lm loss: 1.676178E+00 | loss scale: 131072.0 | grad norm: 15818.187 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21000/ 152972 | consumed samples: 5672384 | consumed tokens: 11617042432 | elapsed time per iteration (ms): 4765.6 | learning rate: 1.974E-04 | global batch size: 512 | lm loss: 1.613849E+00 | loss scale: 131072.0 | grad norm: 8304.669 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 21000 | lm loss value: 1.646088E+00 | lm loss PPL: 5.186652E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 21000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 09:12:24,792] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/mp_rank_00_model_states.pt [2021-11-22 09:12:25,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,225] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,228] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,245] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,249] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,251] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,252] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,253] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,258] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 09:12:25,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 09:12:25,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step21000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 21000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2754.20 iteration 21200/ 152972 | consumed samples: 5774784 | consumed tokens: 11826757632 | elapsed time per iteration (ms): 5569.3 | learning rate: 1.973E-04 | global batch size: 512 | lm loss: 1.666703E+00 | loss scale: 262144.0 | grad norm: 32256.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21400/ 152972 | consumed samples: 5877184 | consumed tokens: 12036472832 | elapsed time per iteration (ms): 4793.0 | learning rate: 1.972E-04 | global batch size: 512 | lm loss: 1.673685E+00 | loss scale: 262144.0 | grad norm: 26457.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21600/ 152972 | consumed samples: 5979584 | consumed tokens: 12246188032 | elapsed time per iteration (ms): 4798.1 | learning rate: 1.971E-04 | global batch size: 512 | lm loss: 1.708040E+00 | loss scale: 262144.0 | grad norm: 41392.282 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 21800/ 152972 | consumed samples: 6081984 | consumed tokens: 12455903232 | elapsed time per iteration (ms): 4816.2 | learning rate: 1.970E-04 | global batch size: 512 | lm loss: 1.728763E+00 | loss scale: 262144.0 | grad norm: 27061.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 10:32:25,525] [INFO] [logging.py:68:log_dist] [Rank 0] step=22000, skipped=42, lr=[0.00019686703702517352, 0.00019686703702517352], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 22000 loss: 1.5972 iter time (s): 0.002 samples/sec: 213346.433 iteration 22000/ 152972 | consumed samples: 6184384 | consumed tokens: 12665618432 | elapsed time per iteration (ms): 4819.8 | learning rate: 1.969E-04 | global batch size: 512 | lm loss: 1.752445E+00 | loss scale: 262144.0 | grad norm: 33709.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 22000 | lm loss value: 1.709579E+00 | lm loss PPL: 5.526635E+00 | ------------------------------------------------------------------------------------------- iteration 22200/ 152972 | consumed samples: 6286784 | consumed tokens: 12875333632 | elapsed time per iteration (ms): 5549.2 | learning rate: 1.968E-04 | global batch size: 512 | lm loss: 1.666750E+00 | loss scale: 131072.0 | grad norm: 9516.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 22400/ 152972 | consumed samples: 6389184 | consumed tokens: 13085048832 | elapsed time per iteration (ms): 4835.6 | learning rate: 1.967E-04 | global batch size: 512 | lm loss: 1.614038E+00 | loss scale: 131072.0 | grad norm: 12719.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 22500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 11:15:07,882] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/mp_rank_00_model_states.pt [2021-11-22 11:15:08,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,341] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,351] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,405] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 11:15:08,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 11:15:08,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22500/zero_pp_rank_25_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 22500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2936.68 saving checkpoint at iteration 22513 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 11:16:13,480] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/mp_rank_00_model_states.pt [2021-11-22 11:16:13,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,919] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 11:16:13,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 11:16:13,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step22513/zero_pp_rank_18_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 22513 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2661.19 [exiting program after 1190.0596614519754 minutes] datetime: 2021-11-22 11:16:14 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninja .................. .................. ..................ninja[OKAY][OKAY] [OKAY]----------------------------------------------------------------------------------------------------.................. --------------------------------------------------[OKAY]op nameop name ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ op name................................-------------------------------------------------- installed................ installedop name.. ..installed................ compatible .. -------------------------------------------------- op name compatibleinstalled -------------------------------------------------- compatible -------------------------------------------------- .. --------------------------------------------------compatible -------------------------------------------------- op name op name ................op name ................ ................ installed................ installed installed ..installed .. .. compatible..compatible compatible--------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... [YES]cpu_adam[YES]............... ........................... [YES] [OKAY][YES] [OKAY] ...... ...... [OKAY][OKAY] cpu_adamcpu_adam cpu_adam...............cpu_adam............... ...............[YES]...............[YES] ......[YES][YES]...... ......[OKAY]......[OKAY] [OKAY][OKAY] fused_adam ............. [YES]fused_adam ...... .............fused_adam[OKAY] fused_adam............. [YES] fused_adamfused_adam .......................... fused_adamfused_adam [YES] [YES]............. ............. ...... [YES]...... [YES] [OKAY] [OKAY]...... ...... [OKAY][OKAY] fused_lamb .............[YES]......fused_lamb [OKAY]................... [YES] [OKAY][YES]fused_lamb...... fused_lamb .............fused_lamb............. fused_lamb [YES].............[YES] ...... .............[YES] ...... [OKAY] [YES]...... [OKAY] ......[OKAY] [OKAY] ................... [OKAY]fused_lamb[OKAY][YES] ......fused_lamb............. [OKAY] ............. [YES] [YES]...... ......[OKAY] sparse_attn sparse_attn............ sparse_attn............sparse_attn [NO]........................ .......[NO][NO] [OKAY].............. [OKAY] sparse_attn ............ [NO] ....... [OKAY] [NO] [OKAY] [OKAY]transformer....... ............transformer[OKAY] sparse_attn ............transformer [NO]............sparse_attn .......[YES]sparse_attn ............[OKAY].................. transformer[YES]............ ..................transformer [YES] [YES][OKAY]............ ............[YES] [OKAY][OKAY]...... stochastic_transformer[OKAY] . stochastic_transformer[YES]stochastic_transformer .......stochastic_transformer. [YES][OKAY][YES]. ............[YES] [OKAY][OKAY]...... [OKAY] [NO][OKAY] [NO] .......transformer ...................[OKAY]stochastic_transformer .[OKAY][YES] [YES] transformer ...... ...... transformer ............[OKAY] [OKAY]............ [YES] [YES]...... ......[OKAY]stochastic_transformer [OKAY]. stochastic_transformer[YES] ......stochastic_transformer . [OKAY] . [YES] [YES]...... ......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[YES] [YES]...... ......[OKAY] [OKAY] fused_lamb .............fused_lamb [YES]............. ......[YES] [OKAY]...... [OKAY] sparse_attn ............ sparse_attn[NO] ................... [NO][OKAY] ....... [OKAY] transformer ............ transformer[YES] .................. [YES][OKAY] ...... [OKAY] stochastic_transformer . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY].................. [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name ................................ ................ installed................installedinstalled ....installed.. compatible.. compatible compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adamcpu_adam [YES]cpu_adam ............... ............... ..................... [YES] [YES][OKAY] [YES] ...... ............ [OKAY][OKAY][OKAY] fused_adam ............. [YES] ...... fused_adam[OKAY]fused_adam .............fused_adam............. fused_lamb............. [YES] ......[YES] .............[YES] [OKAY] [YES]...... ...... ......[OKAY][OKAY] fused_lamb [OKAY] fused_lambfused_lamb............. .............[YES]............. [YES] ...... [YES] ...... [OKAY] ......sparse_attn [OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [YES] sparse_attn......sparse_attnsparse_attn ........................[OKAY]............ [NO] [NO] [NO] .......stochastic_transformer ....... ....... [OKAY]. [OKAY] [OKAY] [YES] transformer transformer transformer...... ............ ............[OKAY]............[YES] [YES] [YES]............ [OKAY]......[OKAY] [OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer .[YES] . [YES] ...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ..................[OKAY][OKAY].................. [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name ................ op name ................................ installed ................ installedinstalled .. installed.. .. compatible.. compatible--------------------------------------------------compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES] ......cpu_adamcpu_adam cpu_adam[OKAY] ............................................. [YES][YES][YES] ..................fused_adam [OKAY][OKAY].............[OKAY] [YES] ...... [OKAY] fused_adamfused_lambfused_adam fused_adam ....................................... .............[YES][YES][YES] ......[YES] ......[OKAY]...... ...... [OKAY] [OKAY] [OKAY] fused_lamb fused_lambfused_lamb............. .............[YES]............. [YES]...... [YES] sparse_attn...... [OKAY] ......[OKAY]............ [OKAY][NO] ....... [OKAY] transformer ............sparse_attn [YES]............ sparse_attn ...... [NO] sparse_attn............ [OKAY] ....... ............[NO] [OKAY][NO]stochastic_transformer....... .......[OKAY] .transformer [OKAY]............[YES] ......transformer[YES] transformer [OKAY].................. ............[OKAY][YES] [YES]...... ......[OKAY]stochastic_transformer [OKAY] . [YES]stochastic_transformer ......stochastic_transformer. [OKAY][YES]. ......[YES] [OKAY]...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................ ................ installedinstalled installed installed .... .. .. compatiblecompatible compatible compatible ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam...............cpu_adam [YES]...............[YES]............... ......[YES]......[YES] ......[OKAY]......[OKAY] [OKAY][OKAY] fused_adamfused_adam .......................... fused_adamfused_adam[YES] [YES]................................ ......[YES][YES][OKAY] [OKAY]............ [OKAY][OKAY]fused_lamb .............fused_lamb fused_lamb fused_lamb............. [YES] ..........................[YES] ......[YES][YES]...... [OKAY]............[OKAY] [OKAY][OKAY] sparse_attnsparse_attn ........................sparse_attn sparse_attn [NO][NO] ............ .......................... [OKAY][OKAY][NO][NO] ..............transformer [OKAY]transformer[OKAY]............ ............[YES] transformertransformer......[YES] ........................[OKAY] ...... [YES] [YES] [OKAY] ...... ...... stochastic_transformer[OKAY][OKAY] .stochastic_transformer [YES]. ......stochastic_transformer stochastic_transformer[YES] [OKAY] . ....... [YES][YES][OKAY] ............ [OKAY][OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name ................op name op nameop nameinstalled .................................................. installed installedinstalled compatible ......-------------------------------------------------- compatiblecompatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... cpu_adamcpu_adam[OKAY]cpu_adam ............................................. [YES][YES][YES]fused_adam ............................... [OKAY] [YES][OKAY] [OKAY] ...... [OKAY] fused_adam fused_lamb.............fused_adam ............. fused_adam[YES] ............. [YES] ...... .............[YES] ...... [OKAY]...... [YES] [OKAY]fused_lamb[OKAY]...... .............[OKAY] fused_lamb [YES] ...................fused_lamb [YES][OKAY]............. sparse_attn ......[YES]............ [OKAY][NO]...... .......[OKAY] [OKAY] sparse_attntransformer ........................ [NO][YES] .......sparse_attn...... [OKAY][OKAY]sparse_attn............ ............transformer[NO] ............[NO] stochastic_transformer[YES] ....... . .............[OKAY][YES] [OKAY]......[OKAY] transformer [OKAY]transformer............stochastic_transformer ............[YES]. ......[YES][YES] [OKAY]............ [OKAY][OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [OKAY] [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name op name ................ ................................ ................ installedinstalledinstalledinstalled .. .... .. compatiblecompatiblecompatible compatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam............... ............... ............... ...............[YES] [YES] [YES] ......[YES] ...... ...... [OKAY] ......[OKAY] [OKAY] [OKAY] fused_adam fused_adamfused_adam............. fused_adam ............. [YES].......................... [YES]......[YES][YES] ...... [OKAY]............ [OKAY][OKAY][OKAY] fused_lamb fused_lamb.............fused_lambfused_lamb .............[YES].......................... [YES][YES] ...... [YES]...... ...... [OKAY] [OKAY]...... [OKAY] [OKAY] sparse_attn sparse_attn............sparse_attn sparse_attn ............ [NO] ........................ [NO]....... [NO]....... [OKAY] [NO][OKAY]....... transformer....... [OKAY] transformer............ [OKAY]............ transformer[YES][YES]transformer ...... .................. ............ [OKAY][YES] [OKAY] [YES] ...... ......[OKAY] stochastic_transformer stochastic_transformer[OKAY] . .stochastic_transformer [YES][YES]. stochastic_transformer ......[YES]...... . [OKAY]...... [OKAY] [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op name op nameop name op name................................ ................................installedinstalled installed installed.. .. ....compatible compatible compatible-------------------------------------------------- compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam ...............cpu_adam ............... ...............[YES]............... [YES]...... [YES]......[YES][OKAY] [OKAY] ............ [OKAY][OKAY] fused_adamfused_adam .............fused_adam .............fused_adam[YES]............. [YES]...................[YES] ......[OKAY]......[YES] [OKAY]......[OKAY] fused_lamb [OKAY]............. fused_lambfused_lamb[YES] fused_lamb................................ .............[YES][YES][OKAY] ...... [YES]...... [OKAY]......[OKAY] [OKAY] sparse_attn ............ [NO] sparse_attn.......sparse_attnsparse_attn [OKAY]........................ ............ [NO] [NO] [NO] ....... ..............transformer [OKAY] [OKAY][OKAY] ............ [YES] transformer......transformertransformer ............[OKAY]........................ [YES][YES][YES] ............stochastic_transformer ...... [OKAY][OKAY] . [OKAY] [YES]stochastic_transformer stochastic_transformer.......stochastic_transformer .[OKAY].[YES] [YES]......[YES] ...... [OKAY] ...... [OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................ ................ ................installed installedinstalled installed.. compatible.. .. ..compatible -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam..................... .............................. [OKAY] [YES][YES] [YES] .................. [OKAY] fused_adam[OKAY] [OKAY] ............. [YES] ...... fused_adam[OKAY] fused_adam............. fused_adam .............fused_lamb[YES] ............. ......[YES] ............. [OKAY][YES]......[YES] [OKAY]fused_lamb ............ .............[OKAY]fused_lamb[OKAY] [YES] ................... fused_lamb[YES] .............[OKAY]...... [YES][OKAY]sparse_attn .................. [NO] [OKAY]....... [OKAY]sparse_attn ............transformer sparse_attn [NO]........................ [NO][YES]sparse_attn....... ......................... [OKAY][NO][OKAY][OKAY] ....... transformertransformer[OKAY]stochastic_transformer ......................... transformer[YES] [YES][YES] ............ ............ ...... [YES][OKAY][OKAY] [OKAY] ...... [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer.. .[YES] [YES] [YES] ...... ............ [OKAY][OKAY][OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop nameop name op name ................ ................................ ................ installed installedinstalled installed ........ compatible compatible compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adamcpu_adam...............[YES] ..............................[YES]...... [YES][OKAY]...... [YES] ......[OKAY] ...... [OKAY][OKAY] fused_adam ............. [YES]fused_adam ...... fused_adam[OKAY].............fused_adam ............. [YES].............fused_lamb[YES] ...................[YES] ...... [OKAY][YES] ............ [OKAY] fused_lamb[OKAY] [OKAY] ............. fused_lamb[YES]fused_lamb ................................ [YES][YES][OKAY] ............ [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer sparse_attnsparse_attn ................................................ [NO] [YES] [NO] [NO].................... .......[OKAY][OKAY][OKAY] [OKAY] stochastic_transformertransformertransformer transformer. ............ [YES]............ ............ ......[YES][YES] [OKAY][YES] ............ ......[OKAY][OKAY] [OKAY] stochastic_transformer stochastic_transformer.stochastic_transformer ..[YES] [YES][YES]...... ............[OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja--------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------op name op nameop nameop name ................ ................................................installed installed installedinstalled .. ...... compatible compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adamcpu_adam...............[YES] [YES].................................... [YES]......[YES][OKAY] ......[OKAY]...... [OKAY][OKAY] fused_adam ............. [YES] ...... [OKAY]fused_adamfused_adam fused_adam .............fused_lamb............. ............. [YES] [YES][YES]...... ............. ...... ......[OKAY][YES] [OKAY][OKAY]...... fused_lamb[OKAY] fused_lambfused_lamb ....................................... [YES][YES] [YES] ............ [OKAY] ......[OKAY]sparse_attn [OKAY]............ [NO] ....... [OKAY] transformer ............sparse_attn sparse_attn [YES] ............sparse_attn ...... ............[NO]............ [OKAY] [NO] .......[NO] stochastic_transformer[OKAY].............. .[OKAY][OKAY] transformer[YES] transformer............ transformer .................. [YES] ............ [OKAY][YES]...... [YES][OKAY] ............ stochastic_transformer[OKAY] [OKAY]. stochastic_transformer[YES] ....... stochastic_transformer[YES][OKAY] ....... [OKAY] [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name ................................ ................ ................ installedinstalledinstalledinstalled ...... .. compatiblecompatible compatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES] ............... ............... [YES]...... [YES] [YES] [OKAY] ............ ...... [OKAY][OKAY][OKAY] fused_adam ............. [YES] ...... fused_adam[OKAY]fused_adam fused_adam ............. ............. .............[YES] fused_lamb [YES][YES] ...... ............. ...... ......[OKAY] [YES][OKAY][OKAY] fused_lamb...... fused_lamb.............[OKAY] fused_lamb .............[YES] ...................[YES] [OKAY][YES]...... ......[OKAY] sparse_attn[OKAY] ............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO][YES]sparse_attn sparse_attn ............. ............ ............ [OKAY] [OKAY][NO] [NO] .............. transformerstochastic_transformer ............[OKAY] [OKAY] . [YES] [YES]...... ......transformer[OKAY]transformer [OKAY]........................ [YES][YES]stochastic_transformer ............. [OKAY][OKAY][YES] ...... [OKAY] stochastic_transformer stochastic_transformer. [YES]. ...... [YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja.................. .................. ....................................[OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name -------------------------------------------------- op name ................ op name................................installed installed installed.................... installed.. compatiblecompatible compatible..---------------------------------------------------------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam..............................[YES] [YES].....................[YES] ...... [OKAY][YES]...... [OKAY][OKAY]...... [OKAY] fused_adam fused_adamfused_adamfused_adam............. .......................................[YES] [YES][YES][YES]...... [OKAY].................. [OKAY] [OKAY][OKAY] fused_lambfused_lamb fused_lamb .............fused_lamb ............. [YES] ............. ............. [YES]......[YES] [YES][OKAY]............ ......[OKAY][OKAY] [OKAY] sparse_attn ............ sparse_attnsparse_attn[NO]sparse_attn ........................................... [NO][NO][NO] [OKAY] ....... .............. [OKAY][OKAY][OKAY] transformer ............transformer transformer transformer[YES] ............ ............ .................. [YES] [YES] [OKAY][YES]...... ............[OKAY] [OKAY]stochastic_transformer[OKAY] .stochastic_transformer stochastic_transformer[YES]stochastic_transformer. ........ [YES] [OKAY] [YES] [YES]............ ......[OKAY][OKAY] [OKAY] ninjaninjaninjaninja .................. .................. ..................[OKAY]..................[OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- op name----------------------------------------------------------------------------------------------------op name ................ op name ................op nameinstalled ................installed.................. installed..installedcompatible compatible..--------------------------------------------------.. --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adamcpu_adam .....................[YES]............... ......[OKAY][YES][YES] [OKAY]............ [OKAY][OKAY] fused_adam ............. [YES]fused_adamfused_adamfused_adam ............. ...................[YES]............. [OKAY][YES][YES]...... ............[OKAY] fused_lamb[OKAY][OKAY] .............fused_lamb [YES]fused_lamb.............fused_lamb ................... [YES]............. [OKAY] [YES] ......[YES] [OKAY]............ [OKAY][OKAY] sparse_attn ............ sparse_attn[NO] sparse_attnsparse_attn................... ............[NO]............ [OKAY][NO] ....... [NO] ....... [OKAY] .......transformer [OKAY] [OKAY]............transformer transformer[YES]............ transformer .................. [YES] ..................[YES][OKAY] [YES][OKAY]...... ......[OKAY] stochastic_transformer stochastic_transformer[OKAY] ..stochastic_transformer [YES][YES]stochastic_transformer . ...... ......[OKAY].[YES] [OKAY] ...... [YES] [OKAY]...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [OKAY] utils .................. [YES] ...... [OKAY] async_ioquantizer ............................. [NO][NO] .............. [NO][OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io: please install the libaio-devel package with yum async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] [WARNING]  async_io: please install the libaio-devel package with yum ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.2torch install path ............... torch cuda version ............... 11.1 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']nvcc version ..................... 11.2 torch versiondeepspeed install path ............................... 1.8.2 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] torch cuda versiondeepspeed info .................................. 11.10.5.5+58a8e13, 58a8e13, master nvcc versiondeepspeed wheel compiled w. ........................... 11.2torch 1.8, cuda 11.1 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ......DeepSpeed general environment info: torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ............... [NO] ....... [NO] async_iotransformer_inference ................. [NO][NO] .............. [NO][OKAY] utils .................. [YES] ...... [OKAY] transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.2 torch cuda versiontorch install path .............................. 11.1 nvcc version ..................... 11.2['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] deepspeed install path ........... torch version ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'].................... 1.8.2deepspeed info ................... torch cuda version0.5.5+58a8e13, 58a8e13, master ............... deepspeed wheel compiled w.11.1 ...... nvcc versiontorch 1.8, cuda 11.1 ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY]quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] quantizer ..............async_io [NO] async_io...................... ...............[NO][OKAY] [NO]....... .......[NO] --------------------------------------------------[NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer .............. [NO]quantizer ..................... [OKAY][NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. quantizer .............. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.2 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']11.1 nvcc versiontorch version ......................................... 11.21.8.2 deepspeed install path ...........torch cuda version ...............['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 11.1deepspeed info nvcc version................... .....................0.5.5+58a8e13, 58a8e13, master 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inference .. [NO]utils ......................... [OKAY][YES] ...... [OKAY] utilsquantizer ................................ [YES][NO] ............. [OKAY][OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.2 quantizer .............. [NO] ....... [OKAY] torch cuda version ............... 11.1 -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch version .................... 1.8.2 torch cuda version ............... 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.2 11.2 deepspeed install path deepspeed install path........... ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info deepspeed info................... ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] DeepSpeed general environment info: utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 DeepSpeed general environment info:nvcc version ..................... 11.2 deepspeed install path ...........torch install path ...............['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path DeepSpeed general environment info:........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... torch install path0.5.5+58a8e13, 58a8e13, master ...............deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch install path ...............torch version .................... 1.8.2 torch cuda version['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ............... 11.1torch version nvcc version.................... .....................1.8.2 11.2 torch cuda versiondeepspeed install path .......................... 11.1 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']nvcc version deepspeed info..................... ................... 11.20.5.5+58a8e13, 58a8e13, master deepspeed install path deepspeed wheel compiled w............ ...... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']torch 1.8, cuda 11.1 deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name................op name ................................................installed installed installed ..installed .. ..compatible .. compatible compatible compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam .............................................cpu_adam [YES][YES][YES] ................................. [OKAY][YES][OKAY][OKAY] ...... [OKAY] fused_adamfused_adam .......................... fused_adam[YES][YES]fused_adam ...... ................................ [OKAY][YES][OKAY][YES] ............ fused_lamb[OKAY]fused_lamb [OKAY] ............. .............fused_lamb [YES][YES]fused_lamb............. ................... ......[YES] [OKAY][YES] [OKAY] ............ [OKAY][OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] sparse_attn............sparse_attn ............transformer[NO]............ ....... ............[NO] [NO] .......[OKAY][YES] .............[OKAY] transformer[OKAY][OKAY] transformer............ [YES]............transformer stochastic_transformer [YES]...... ................... [OKAY][YES] [YES][OKAY] stochastic_transformer ............ [OKAY] [OKAY]stochastic_transformer. [YES]. stochastic_transformer...... [YES].[OKAY] ......[YES] [OKAY]...... [OKAY] **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.2118965.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 > setting tensorboard ...  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-22 13:47:33,646] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.332 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 21.290 seconds time to initialize megatron (seconds): -4.688 [after megatron is initialized] datetime: 2021-11-22 13:47:55 building GPT model ... [2021-11-22 13:47:55,351] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-22 13:47:55,352] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-22 13:47:55,352] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 36.52 GB, percent = 19.5% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-22 13:47:56,649] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-22 13:47:57,191] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-22 13:47:57,192] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-22 13:47:57,192] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 36.74 GB, percent = 19.6% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-22 13:47:57,211] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-22 13:47:57,535] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-22 13:47:57,535] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-22 13:47:57,535] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-22 13:47:57,539] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-22 13:47:57,539] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-22 13:47:57,539] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-22 13:47:57,539] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-22 13:47:57,539] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-22 13:47:57,539] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-22 13:47:57,539] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] [2021-11-22 13:47:59,277] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-22 13:47:59,277] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-22 13:47:59,278] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.54 GB, percent = 20.6% [2021-11-22 13:47:59,316] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-22 13:47:59,317] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-22 13:47:59,317] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.68 GB, percent = 20.7% [2021-11-22 13:47:59,317] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-22 13:47:59,342] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-22 13:47:59,343] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-22 13:47:59,343] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.68 GB, percent = 20.7% [2021-11-22 13:47:59,343] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-22 13:47:59,343] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-22 13:47:59,343] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-22 13:47:59,343] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-22 13:47:59,343] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-22 13:47:59,343] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] amp_params ................... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] dump_state ................... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-22 13:47:59,344] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] pld_params ................... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-22 13:47:59,345] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-22 13:47:59,346] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-22 13:47:59,346] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-22 13:47:59,375] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-22 13:47:59,375] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 57 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 59successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 52 successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 41successfully loaded 32 ZeRO state_dicts for rank 43 successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 42 successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 14successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 1successfully loaded 32 ZeRO state_dicts for rank 3 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 30 successfully loaded 32 ZeRO state_dicts for rank 31successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 18 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 45loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 1 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 13 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 55 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 22513 time (ms) | load-checkpoint: 11709.33 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.208598528 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-22 13:48:11 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.236582 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.070 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.204 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.084 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-22 13:48:25 done with setup ... training ... time (ms) | model-and-optimizer-setup: 15801.83 | train/valid/test-data-iterators-setup: 13497.35 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.208598528 billion Number of parameters without embeddings: 1.20860672 billion [before the start of training step] datetime: 2021-11-22 13:48:25 [2021-11-22 13:48:25,174] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-22 13:48:25,174] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-22 13:48:25,174] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-22 13:48:25,174] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-22 13:48:25,174] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 32] (after 22600 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 [Rank 0] (after 22600 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 iteration 22600/ 152972 | consumed samples: 6491584 | consumed tokens: 13294764032 | elapsed time per iteration (ms): 4676.9 | learning rate: 1.965E-04 | global batch size: 512 | lm loss: 1.607318E+00 | loss scale: 131072.0 | grad norm: 8377.212 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 22800/ 152972 | consumed samples: 6593984 | consumed tokens: 13504479232 | elapsed time per iteration (ms): 4652.0 | learning rate: 1.964E-04 | global batch size: 512 | lm loss: 1.661776E+00 | loss scale: 262144.0 | grad norm: 26889.519 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23000/ 152972 | consumed samples: 6696384 | consumed tokens: 13714194432 | elapsed time per iteration (ms): 4649.5 | learning rate: 1.963E-04 | global batch size: 512 | lm loss: 1.648786E+00 | loss scale: 131072.0 | grad norm: 10785.248 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 23000 | lm loss value: 1.620875E+00 | lm loss PPL: 5.057516E+00 | ------------------------------------------------------------------------------------------- iteration 23200/ 152972 | consumed samples: 6798784 | consumed tokens: 13923909632 | elapsed time per iteration (ms): 5295.2 | learning rate: 1.962E-04 | global batch size: 512 | lm loss: 1.611943E+00 | loss scale: 16384.0 | grad norm: 2279.853 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23400/ 152972 | consumed samples: 6901184 | consumed tokens: 14133624832 | elapsed time per iteration (ms): 4637.5 | learning rate: 1.961E-04 | global batch size: 512 | lm loss: 1.634911E+00 | loss scale: 16384.0 | grad norm: 1514.031 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23600/ 152972 | consumed samples: 7003584 | consumed tokens: 14343340032 | elapsed time per iteration (ms): 4644.8 | learning rate: 1.960E-04 | global batch size: 512 | lm loss: 1.601279E+00 | loss scale: 16384.0 | grad norm: 1805.766 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 23800/ 152972 | consumed samples: 7105984 | consumed tokens: 14553055232 | elapsed time per iteration (ms): 4655.1 | learning rate: 1.958E-04 | global batch size: 512 | lm loss: 1.589386E+00 | loss scale: 32768.0 | grad norm: 2930.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 15:45:46,511] [INFO] [logging.py:68:log_dist] [Rank 0] step=24000, skipped=48, lr=[0.00019571664562609086, 0.00019571664562609086], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 24000 loss: 2.7315 iter time (s): 0.002 samples/sec: 220795.058 iteration 24000/ 152972 | consumed samples: 7208384 | consumed tokens: 14762770432 | elapsed time per iteration (ms): 4639.3 | learning rate: 1.957E-04 | global batch size: 512 | lm loss: 1.637670E+00 | loss scale: 32768.0 | grad norm: 6284.194 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 24000 | lm loss value: 1.674811E+00 | lm loss PPL: 5.337788E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 24000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 15:48:04,135] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/mp_rank_00_model_states.pt [2021-11-22 15:48:04,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 15:48:04,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 15:48:04,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step24000/zero_pp_rank_24_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 24000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2881.52 iteration 24200/ 152972 | consumed samples: 7310784 | consumed tokens: 14972485632 | elapsed time per iteration (ms): 5330.7 | learning rate: 1.956E-04 | global batch size: 512 | lm loss: 1.634252E+00 | loss scale: 65536.0 | grad norm: 6354.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 24400/ 152972 | consumed samples: 7413184 | consumed tokens: 15182200832 | elapsed time per iteration (ms): 4654.4 | learning rate: 1.955E-04 | global batch size: 512 | lm loss: 1.684215E+00 | loss scale: 65536.0 | grad norm: 8473.375 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 24600/ 152972 | consumed samples: 7515584 | consumed tokens: 15391916032 | elapsed time per iteration (ms): 4640.9 | learning rate: 1.953E-04 | global batch size: 512 | lm loss: 1.659131E+00 | loss scale: 65536.0 | grad norm: 5776.183 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 24800/ 152972 | consumed samples: 7617984 | consumed tokens: 15601631232 | elapsed time per iteration (ms): 4651.4 | learning rate: 1.952E-04 | global batch size: 512 | lm loss: 1.578858E+00 | loss scale: 131072.0 | grad norm: 11994.946 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 25000/ 152972 | consumed samples: 7720384 | consumed tokens: 15811346432 | elapsed time per iteration (ms): 4638.8 | learning rate: 1.951E-04 | global batch size: 512 | lm loss: 1.621120E+00 | loss scale: 131072.0 | grad norm: 6637.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 25000 | lm loss value: 1.562452E+00 | lm loss PPL: 4.770502E+00 | ------------------------------------------------------------------------------------------- iteration 25200/ 152972 | consumed samples: 7822784 | consumed tokens: 16021061632 | elapsed time per iteration (ms): 5220.2 | learning rate: 1.949E-04 | global batch size: 512 | lm loss: 1.688553E+00 | loss scale: 262144.0 | grad norm: 14764.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 25400/ 152972 | consumed samples: 7925184 | consumed tokens: 16230776832 | elapsed time per iteration (ms): 4645.1 | learning rate: 1.948E-04 | global batch size: 512 | lm loss: 1.618628E+00 | loss scale: 262144.0 | grad norm: 22942.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 25500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 17:46:08,968] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/mp_rank_00_model_states.pt [2021-11-22 17:46:09,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,405] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,444] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 17:46:09,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 17:46:09,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step25500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 25500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2763.01 iteration 25600/ 152972 | consumed samples: 8027584 | consumed tokens: 16440492032 | elapsed time per iteration (ms): 4691.1 | learning rate: 1.947E-04 | global batch size: 512 | lm loss: 1.635568E+00 | loss scale: 262144.0 | grad norm: 42040.735 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 25800/ 152972 | consumed samples: 8129984 | consumed tokens: 16650207232 | elapsed time per iteration (ms): 4644.1 | learning rate: 1.945E-04 | global batch size: 512 | lm loss: 1.635332E+00 | loss scale: 131072.0 | grad norm: 16883.759 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 18:24:58,525] [INFO] [logging.py:68:log_dist] [Rank 0] step=26000, skipped=51, lr=[0.0001943893230058786, 0.0001943893230058786], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 26000 loss: 1.2834 iter time (s): 0.002 samples/sec: 221271.903 iteration 26000/ 152972 | consumed samples: 8232384 | consumed tokens: 16859922432 | elapsed time per iteration (ms): 4643.3 | learning rate: 1.944E-04 | global batch size: 512 | lm loss: 1.580298E+00 | loss scale: 131072.0 | grad norm: 11530.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 26000 | lm loss value: 1.612349E+00 | lm loss PPL: 5.014577E+00 | ------------------------------------------------------------------------------------------- iteration 26200/ 152972 | consumed samples: 8334784 | consumed tokens: 17069637632 | elapsed time per iteration (ms): 5188.1 | learning rate: 1.942E-04 | global batch size: 512 | lm loss: 1.608192E+00 | loss scale: 262144.0 | grad norm: 25958.677 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 26400/ 152972 | consumed samples: 8437184 | consumed tokens: 17279352832 | elapsed time per iteration (ms): 4643.5 | learning rate: 1.941E-04 | global batch size: 512 | lm loss: 1.647146E+00 | loss scale: 262144.0 | grad norm: 21674.106 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 26600/ 152972 | consumed samples: 8539584 | consumed tokens: 17489068032 | elapsed time per iteration (ms): 4644.1 | learning rate: 1.940E-04 | global batch size: 512 | lm loss: 1.674143E+00 | loss scale: 65536.0 | grad norm: 17308.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 26800/ 152972 | consumed samples: 8641984 | consumed tokens: 17698783232 | elapsed time per iteration (ms): 4647.4 | learning rate: 1.938E-04 | global batch size: 512 | lm loss: 1.634238E+00 | loss scale: 65536.0 | grad norm: 6158.904 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27000/ 152972 | consumed samples: 8744384 | consumed tokens: 17908498432 | elapsed time per iteration (ms): 4637.1 | learning rate: 1.937E-04 | global batch size: 512 | lm loss: 1.638250E+00 | loss scale: 8192.0 | grad norm: 944.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 27000 | lm loss value: 1.634937E+00 | lm loss PPL: 5.129136E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 27000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 19:46:02,094] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/mp_rank_00_model_states.pt [2021-11-22 19:46:02,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,535] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,539] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,544] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 19:46:02,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 19:46:02,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step27000/zero_pp_rank_12_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 27000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2738.36 iteration 27200/ 152972 | consumed samples: 8846784 | consumed tokens: 18118213632 | elapsed time per iteration (ms): 5184.7 | learning rate: 1.935E-04 | global batch size: 512 | lm loss: 1.589761E+00 | loss scale: 8192.0 | grad norm: 837.211 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27400/ 152972 | consumed samples: 8949184 | consumed tokens: 18327928832 | elapsed time per iteration (ms): 4639.0 | learning rate: 1.934E-04 | global batch size: 512 | lm loss: 1.669330E+00 | loss scale: 8192.0 | grad norm: 1557.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27600/ 152972 | consumed samples: 9051584 | consumed tokens: 18537644032 | elapsed time per iteration (ms): 4640.1 | learning rate: 1.932E-04 | global batch size: 512 | lm loss: 1.613755E+00 | loss scale: 16384.0 | grad norm: 1729.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 27800/ 152972 | consumed samples: 9153984 | consumed tokens: 18747359232 | elapsed time per iteration (ms): 4642.9 | learning rate: 1.930E-04 | global batch size: 512 | lm loss: 1.656630E+00 | loss scale: 16384.0 | grad norm: 1859.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 21:03:19,993] [INFO] [logging.py:68:log_dist] [Rank 0] step=28000, skipped=57, lr=[0.0001928916141670899, 0.0001928916141670899], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 28000 loss: 2.5548 iter time (s): 0.002 samples/sec: 220060.042 iteration 28000/ 152972 | consumed samples: 9256384 | consumed tokens: 18957074432 | elapsed time per iteration (ms): 4640.4 | learning rate: 1.929E-04 | global batch size: 512 | lm loss: 1.571704E+00 | loss scale: 32768.0 | grad norm: 4819.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 28000 | lm loss value: 1.564850E+00 | lm loss PPL: 4.781960E+00 | ------------------------------------------------------------------------------------------- iteration 28200/ 152972 | consumed samples: 9358784 | consumed tokens: 19166789632 | elapsed time per iteration (ms): 5187.9 | learning rate: 1.927E-04 | global batch size: 512 | lm loss: 1.631217E+00 | loss scale: 32768.0 | grad norm: 2953.442 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 28400/ 152972 | consumed samples: 9461184 | consumed tokens: 19376504832 | elapsed time per iteration (ms): 4622.8 | learning rate: 1.926E-04 | global batch size: 512 | lm loss: 1.591141E+00 | loss scale: 32768.0 | grad norm: 3212.382 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 28500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 21:43:47,723] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/mp_rank_00_model_states.pt [2021-11-22 21:43:48,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,158] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,185] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,195] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 21:43:48,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 21:43:48,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step28500/zero_pp_rank_10_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 28500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2780.58 iteration 28600/ 152972 | consumed samples: 9563584 | consumed tokens: 19586220032 | elapsed time per iteration (ms): 4644.6 | learning rate: 1.924E-04 | global batch size: 512 | lm loss: 1.551327E+00 | loss scale: 65536.0 | grad norm: 10357.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 28800/ 152972 | consumed samples: 9665984 | consumed tokens: 19795935232 | elapsed time per iteration (ms): 4642.6 | learning rate: 1.922E-04 | global batch size: 512 | lm loss: 1.528846E+00 | loss scale: 65536.0 | grad norm: 7973.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29000/ 152972 | consumed samples: 9768384 | consumed tokens: 20005650432 | elapsed time per iteration (ms): 4643.3 | learning rate: 1.921E-04 | global batch size: 512 | lm loss: 1.550174E+00 | loss scale: 131072.0 | grad norm: 20722.690 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 29000 | lm loss value: 1.652734E+00 | lm loss PPL: 5.221235E+00 | ------------------------------------------------------------------------------------------- iteration 29200/ 152972 | consumed samples: 9870784 | consumed tokens: 20215365632 | elapsed time per iteration (ms): 5192.4 | learning rate: 1.919E-04 | global batch size: 512 | lm loss: 1.575513E+00 | loss scale: 131072.0 | grad norm: 11538.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29400/ 152972 | consumed samples: 9973184 | consumed tokens: 20425080832 | elapsed time per iteration (ms): 4649.9 | learning rate: 1.917E-04 | global batch size: 512 | lm loss: 1.615409E+00 | loss scale: 65536.0 | grad norm: 5970.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29600/ 152972 | consumed samples: 10075584 | consumed tokens: 20634796032 | elapsed time per iteration (ms): 4646.0 | learning rate: 1.916E-04 | global batch size: 512 | lm loss: 1.578288E+00 | loss scale: 32768.0 | grad norm: 3226.955 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 29800/ 152972 | consumed samples: 10177984 | consumed tokens: 20844511232 | elapsed time per iteration (ms): 4628.4 | learning rate: 1.914E-04 | global batch size: 512 | lm loss: 1.950755E+00 | loss scale: 512.0 | grad norm: 75.807 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-22 23:41:36,670] [INFO] [logging.py:68:log_dist] [Rank 0] step=30000, skipped=66, lr=[0.0001912271758320237, 0.0001912271758320237], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 30000 loss: 1.1046 iter time (s): 0.002 samples/sec: 222117.191 iteration 30000/ 152972 | consumed samples: 10280384 | consumed tokens: 21054226432 | elapsed time per iteration (ms): 4625.6 | learning rate: 1.912E-04 | global batch size: 512 | lm loss: 1.622989E+00 | loss scale: 512.0 | grad norm: 43.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 30000 | lm loss value: 1.571334E+00 | lm loss PPL: 4.813067E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 30000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-22 23:43:28,215] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/mp_rank_00_model_states.pt [2021-11-22 23:43:28,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,675] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,675] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-22 23:43:28,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-22 23:43:28,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step30000/zero_pp_rank_23_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 30000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2772.04 iteration 30200/ 152972 | consumed samples: 10382784 | consumed tokens: 21263941632 | elapsed time per iteration (ms): 5180.1 | learning rate: 1.911E-04 | global batch size: 512 | lm loss: 1.597819E+00 | loss scale: 1024.0 | grad norm: 119.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 30400/ 152972 | consumed samples: 10485184 | consumed tokens: 21473656832 | elapsed time per iteration (ms): 4618.6 | learning rate: 1.909E-04 | global batch size: 512 | lm loss: 1.640159E+00 | loss scale: 1024.0 | grad norm: 90.306 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 30600/ 152972 | consumed samples: 10587584 | consumed tokens: 21683372032 | elapsed time per iteration (ms): 4628.4 | learning rate: 1.907E-04 | global batch size: 512 | lm loss: 1.610327E+00 | loss scale: 1024.0 | grad norm: 118.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 30800/ 152972 | consumed samples: 10689984 | consumed tokens: 21893087232 | elapsed time per iteration (ms): 4626.4 | learning rate: 1.905E-04 | global batch size: 512 | lm loss: 1.528207E+00 | loss scale: 2048.0 | grad norm: 140.266 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 31000/ 152972 | consumed samples: 10792384 | consumed tokens: 22102802432 | elapsed time per iteration (ms): 4622.5 | learning rate: 1.903E-04 | global batch size: 512 | lm loss: 1.592801E+00 | loss scale: 2048.0 | grad norm: 161.786 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 31000 | lm loss value: 1.574393E+00 | lm loss PPL: 4.827811E+00 | ------------------------------------------------------------------------------------------- iteration 31200/ 152972 | consumed samples: 10894784 | consumed tokens: 22312517632 | elapsed time per iteration (ms): 5169.6 | learning rate: 1.901E-04 | global batch size: 512 | lm loss: 1.621718E+00 | loss scale: 4096.0 | grad norm: 505.395 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 31400/ 152972 | consumed samples: 10997184 | consumed tokens: 22522232832 | elapsed time per iteration (ms): 4632.4 | learning rate: 1.900E-04 | global batch size: 512 | lm loss: 1.602613E+00 | loss scale: 4096.0 | grad norm: 308.085 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 31500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 01:40:57,140] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/mp_rank_00_model_states.pt [2021-11-23 01:40:57,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 01:40:57,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 01:40:57,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step31500/zero_pp_rank_13_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 31500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2696.19 iteration 31600/ 152972 | consumed samples: 11099584 | consumed tokens: 22731948032 | elapsed time per iteration (ms): 4644.4 | learning rate: 1.898E-04 | global batch size: 512 | lm loss: 1.606618E+00 | loss scale: 4096.0 | grad norm: 398.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 31800/ 152972 | consumed samples: 11201984 | consumed tokens: 22941663232 | elapsed time per iteration (ms): 4634.5 | learning rate: 1.896E-04 | global batch size: 512 | lm loss: 1.620350E+00 | loss scale: 8192.0 | grad norm: 879.349 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 02:19:34,204] [INFO] [logging.py:68:log_dist] [Rank 0] step=32000, skipped=66, lr=[0.00018938843749038024, 0.00018938843749038024], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 32000/ 152972 | consumed samples: 11304384 | consumed tokens: 23151378432 | elapsed time per iteration (ms): 4630.8 | learning rate: 1.894E-04 | global batch size: 512 | lm loss: 1.561473E+00 | loss scale: 8192.0 | grad norm: 1098.123 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 32000 loss: 2.0631 iter time (s): 0.002 samples/sec: 222071.988 ------------------------------------------------------------------------------------------- valid loss at iteration 32000 | lm loss value: 1.583512E+00 | lm loss PPL: 4.872037E+00 | ------------------------------------------------------------------------------------------- iteration 32200/ 152972 | consumed samples: 11406784 | consumed tokens: 23361093632 | elapsed time per iteration (ms): 5173.7 | learning rate: 1.892E-04 | global batch size: 512 | lm loss: 1.616048E+00 | loss scale: 16384.0 | grad norm: 3105.331 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 32400/ 152972 | consumed samples: 11509184 | consumed tokens: 23570808832 | elapsed time per iteration (ms): 4633.6 | learning rate: 1.890E-04 | global batch size: 512 | lm loss: 1.556438E+00 | loss scale: 16384.0 | grad norm: 1771.036 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 32600/ 152972 | consumed samples: 11611584 | consumed tokens: 23780524032 | elapsed time per iteration (ms): 4649.5 | learning rate: 1.888E-04 | global batch size: 512 | lm loss: 1.614357E+00 | loss scale: 16384.0 | grad norm: 1275.355 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 32800/ 152972 | consumed samples: 11713984 | consumed tokens: 23990239232 | elapsed time per iteration (ms): 4635.3 | learning rate: 1.886E-04 | global batch size: 512 | lm loss: 1.578462E+00 | loss scale: 32768.0 | grad norm: 2685.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33000/ 152972 | consumed samples: 11816384 | consumed tokens: 24199954432 | elapsed time per iteration (ms): 4625.8 | learning rate: 1.884E-04 | global batch size: 512 | lm loss: 1.580319E+00 | loss scale: 32768.0 | grad norm: 2345.006 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 33000 | lm loss value: 1.591563E+00 | lm loss PPL: 4.911419E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 33000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 03:40:28,453] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/mp_rank_00_model_states.pt [2021-11-23 03:40:28,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,885] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,887] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,889] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,894] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,896] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,907] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,919] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,925] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 03:40:28,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 03:40:28,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step33000/zero_pp_rank_11_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 33000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2769.30 iteration 33200/ 152972 | consumed samples: 11918784 | consumed tokens: 24409669632 | elapsed time per iteration (ms): 5184.7 | learning rate: 1.882E-04 | global batch size: 512 | lm loss: 1.612174E+00 | loss scale: 65536.0 | grad norm: 5627.499 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33400/ 152972 | consumed samples: 12021184 | consumed tokens: 24619384832 | elapsed time per iteration (ms): 4641.2 | learning rate: 1.880E-04 | global batch size: 512 | lm loss: 1.547749E+00 | loss scale: 65536.0 | grad norm: 8455.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33600/ 152972 | consumed samples: 12123584 | consumed tokens: 24829100032 | elapsed time per iteration (ms): 4648.5 | learning rate: 1.878E-04 | global batch size: 512 | lm loss: 1.530891E+00 | loss scale: 65536.0 | grad norm: 8569.950 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 33800/ 152972 | consumed samples: 12225984 | consumed tokens: 25038815232 | elapsed time per iteration (ms): 4647.7 | learning rate: 1.876E-04 | global batch size: 512 | lm loss: 1.585097E+00 | loss scale: 131072.0 | grad norm: 13989.152 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 04:57:51,860] [INFO] [logging.py:68:log_dist] [Rank 0] step=34000, skipped=66, lr=[0.00018738610641457488, 0.00018738610641457488], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 34000 loss: 2.3492 iter time (s): 0.002 samples/sec: 220631.935 iteration 34000/ 152972 | consumed samples: 12328384 | consumed tokens: 25248530432 | elapsed time per iteration (ms): 4648.4 | learning rate: 1.874E-04 | global batch size: 512 | lm loss: 1.601142E+00 | loss scale: 131072.0 | grad norm: 18003.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 34000 | lm loss value: 1.517044E+00 | lm loss PPL: 4.558728E+00 | ------------------------------------------------------------------------------------------- iteration 34200/ 152972 | consumed samples: 12430784 | consumed tokens: 25458245632 | elapsed time per iteration (ms): 5179.7 | learning rate: 1.872E-04 | global batch size: 512 | lm loss: 1.560400E+00 | loss scale: 262144.0 | grad norm: 20323.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 34400/ 152972 | consumed samples: 12533184 | consumed tokens: 25667960832 | elapsed time per iteration (ms): 4648.0 | learning rate: 1.870E-04 | global batch size: 512 | lm loss: 1.624272E+00 | loss scale: 131072.0 | grad norm: 10555.250 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 34500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 05:38:22,848] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/mp_rank_00_model_states.pt [2021-11-23 05:38:23,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,321] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,333] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 05:38:23,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 05:38:23,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step34500/zero_pp_rank_3_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 34500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2690.36 iteration 34600/ 152972 | consumed samples: 12635584 | consumed tokens: 25877676032 | elapsed time per iteration (ms): 4650.3 | learning rate: 1.868E-04 | global batch size: 512 | lm loss: 1.589945E+00 | loss scale: 32768.0 | grad norm: 4175.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 34800/ 152972 | consumed samples: 12737984 | consumed tokens: 26087391232 | elapsed time per iteration (ms): 4651.4 | learning rate: 1.865E-04 | global batch size: 512 | lm loss: 1.592304E+00 | loss scale: 32768.0 | grad norm: 3874.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35000/ 152972 | consumed samples: 12840384 | consumed tokens: 26297106432 | elapsed time per iteration (ms): 4641.5 | learning rate: 1.863E-04 | global batch size: 512 | lm loss: 1.608112E+00 | loss scale: 32768.0 | grad norm: 2752.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 35000 | lm loss value: 1.607566E+00 | lm loss PPL: 4.990652E+00 | ------------------------------------------------------------------------------------------- iteration 35200/ 152972 | consumed samples: 12942784 | consumed tokens: 26506821632 | elapsed time per iteration (ms): 5178.3 | learning rate: 1.861E-04 | global batch size: 512 | lm loss: 1.609262E+00 | loss scale: 65536.0 | grad norm: 6187.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35400/ 152972 | consumed samples: 13045184 | consumed tokens: 26716536832 | elapsed time per iteration (ms): 4635.4 | learning rate: 1.859E-04 | global batch size: 512 | lm loss: 1.559513E+00 | loss scale: 65536.0 | grad norm: 6730.110 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35600/ 152972 | consumed samples: 13147584 | consumed tokens: 26926252032 | elapsed time per iteration (ms): 4623.9 | learning rate: 1.857E-04 | global batch size: 512 | lm loss: 1.593631E+00 | loss scale: 65536.0 | grad norm: 7974.844 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 35800/ 152972 | consumed samples: 13249984 | consumed tokens: 27135967232 | elapsed time per iteration (ms): 4634.8 | learning rate: 1.855E-04 | global batch size: 512 | lm loss: 1.563901E+00 | loss scale: 65536.0 | grad norm: 4372.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 07:36:06,697] [INFO] [logging.py:68:log_dist] [Rank 0] step=36000, skipped=71, lr=[0.00018522966508872216, 0.00018522966508872216], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 36000 loss: 1.3660 iter time (s): 0.002 samples/sec: 220866.556 iteration 36000/ 152972 | consumed samples: 13352384 | consumed tokens: 27345682432 | elapsed time per iteration (ms): 4630.8 | learning rate: 1.852E-04 | global batch size: 512 | lm loss: 1.550529E+00 | loss scale: 65536.0 | grad norm: 5934.276 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 36000 | lm loss value: 1.531289E+00 | lm loss PPL: 4.624132E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 07:37:59,379] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/mp_rank_00_model_states.pt [2021-11-23 07:37:59,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,809] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,814] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,818] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,822] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,829] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 07:37:59,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,883] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 07:37:59,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step36000/zero_pp_rank_12_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 36000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2754.08 iteration 36200/ 152972 | consumed samples: 13454784 | consumed tokens: 27555397632 | elapsed time per iteration (ms): 5199.9 | learning rate: 1.850E-04 | global batch size: 512 | lm loss: 1.515898E+00 | loss scale: 131072.0 | grad norm: 10063.301 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 36400/ 152972 | consumed samples: 13557184 | consumed tokens: 27765112832 | elapsed time per iteration (ms): 4641.8 | learning rate: 1.848E-04 | global batch size: 512 | lm loss: 1.632050E+00 | loss scale: 131072.0 | grad norm: 12474.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 36600/ 152972 | consumed samples: 13659584 | consumed tokens: 27974828032 | elapsed time per iteration (ms): 4650.3 | learning rate: 1.846E-04 | global batch size: 512 | lm loss: 1.614918E+00 | loss scale: 131072.0 | grad norm: 10143.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 36800/ 152972 | consumed samples: 13761984 | consumed tokens: 28184543232 | elapsed time per iteration (ms): 4647.8 | learning rate: 1.843E-04 | global batch size: 512 | lm loss: 1.567894E+00 | loss scale: 262144.0 | grad norm: 24930.779 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 37000/ 152972 | consumed samples: 13864384 | consumed tokens: 28394258432 | elapsed time per iteration (ms): 4646.7 | learning rate: 1.841E-04 | global batch size: 512 | lm loss: 1.594773E+00 | loss scale: 262144.0 | grad norm: 24652.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 37000 | lm loss value: 1.584375E+00 | lm loss PPL: 4.876243E+00 | ------------------------------------------------------------------------------------------- iteration 37200/ 152972 | consumed samples: 13966784 | consumed tokens: 28603973632 | elapsed time per iteration (ms): 5208.2 | learning rate: 1.839E-04 | global batch size: 512 | lm loss: 1.596863E+00 | loss scale: 65536.0 | grad norm: 7546.910 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 37400/ 152972 | consumed samples: 14069184 | consumed tokens: 28813688832 | elapsed time per iteration (ms): 4642.1 | learning rate: 1.836E-04 | global batch size: 512 | lm loss: 1.601100E+00 | loss scale: 65536.0 | grad norm: 7325.765 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 37500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 09:36:01,195] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/mp_rank_00_model_states.pt [2021-11-23 09:36:01,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 09:36:01,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 09:36:01,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37500/zero_pp_rank_5_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 37500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2683.31 saving checkpoint at iteration 37526 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 09:38:04,834] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/mp_rank_00_model_states.pt [2021-11-23 09:38:05,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 09:38:05,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 09:38:05,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step37526/zero_pp_rank_5_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 37526 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2705.50 [exiting program after 1190.0478609402974 minutes] datetime: 2021-11-23 09:38:05 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninjacpu_adam ................................. [YES] [OKAY]...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible fused_adam-------------------------------------------------- ............. [YES] ...... [OKAY] ninjacpu_adam ..................fused_lamb ............... ninja[OKAY] .............[YES] --------------------------------------------------[YES] ........................ op name[OKAY]......[OKAY] ................--------------------------------------------------[OKAY] ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- installed op name..fused_adam compatible............................. -------------------------------------------------- installed --------------------------------------------------op name ................op name installed................ ..installed compatible.. --------------------------------------------------compatible [YES] ........sparse_attn compatible[OKAY]............ cpu_adam -------------------------------------------------- --------------------------------------------------[NO]............... .......fused_lamb[YES] [OKAY]............. cpu_adam ...............cpu_adam [YES]............... [YES]...... ......[OKAY] [OKAY] ...... [YES][OKAY]cpu_adamtransformer ................................. [OKAY][YES][YES] fused_adam fused_adam............. .............[YES] [YES]...... ......[OKAY] [OKAY] ......fused_adam...... [OKAY][OKAY]............. fused_lambfused_lamb .......................... [YES][YES] ............ [OKAY][OKAY] [YES] ...... sparse_attnstochastic_transformer[OKAY] .............fused_adamfused_lamb [NO][YES].......................... ...... .......[YES] [YES] [OKAY] [OKAY] ............ sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] [OKAY]transformer[OKAY] ............ [YES] fused_lamb...... ............. [OKAY][YES] transformertransformer ........................ [YES][YES] ............ [OKAY][OKAY] ...... [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] sparse_attn. ............ [YES][NO] ............. [OKAY][OKAY]sparse_attn transformer............ ............[NO] [YES]....... [OKAY]...... [OKAY] transformer ............ stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- ninjaop name .................................. installed[OKAY] .. ninja .................. [OKAY] compatible-------------------------------------------------- --------------------------------------------------op name ................ installed .. compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......ninja [OKAY] cpu_adam.................. [OKAY]............... ninja .................. [OKAY] --------------------------------------------------ninja op name ................ installed .. compatible -------------------------------------------------- [YES]--------------------------------------------------fused_adam op name .................................. [OKAY]installed .. --------------------------------------------------compatible op name-------------------------------------------------- ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] ......op name ............. [OKAY] ................ninja [YES] installed .......................... [OKAY][OKAY]fused_adamcompatible .............----------------------------------------------------------------------------------------------------fused_lamb fused_adam ............. ninja[YES] ........................ [OKAY][OKAY] fused_adam ............. [YES] ...... [OKAY] [YES] ...................op name [YES][OKAY]................ fused_adamninja fused_lambninja............. .................. .............[YES] .................. [YES][OKAY]...... [OKAY][OKAY] -------------------------------------------------- ......installed cpu_adam [OKAY]fused_lamb .. ............... ............. compatible[YES] [YES]...... -------------------------------------------------- ...... ...... --------------------------------------------------[OKAY]-------------------------------------------------- fused_lamb [OKAY] [OKAY] op name fused_lamb................ .............installed ninja [YES].. ninja........................compatible ..................[OKAY][OKAY] sparse_attn ............cpu_adam [NO]...............fused_adam [YES].................... [YES]......[OKAY]sparse_attn --------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op name op name.............op name ................[YES] ................ installed ...... sparse_attninstalled .. [OKAY]............ ......[OKAY]............ transformer[OKAY][NO] ................... [YES]fused_lamb[OKAY] .. [NO]compatible compatible ....... -------------------------------------------------- -------------------------------------------------- [OKAY] ................op name cpu_adaminstalled ............................... ..installedsparse_attn[YES] compatible ........ ............ [OKAY] --------------------------------------------------[NO]compatible fused_adam................... transformer .............[OKAY] [YES] .......-------------------------------------------------- ............ [YES] ...... [YES]......stochastic_transformer[OKAY] [OKAY]....... [YES][OKAY] ......fused_lamb [OKAY]............. sparse_attn transformer............ ............[NO]cpu_adam cpu_adam[YES] ....... ............... ......[OKAY]............... [OKAY] [YES] [OKAY] transformer[YES] stochastic_transformer [YES]. ......[YES]sparse_attn [OKAY].................. [OKAY][NO] ...... ..................stochastic_transformer[OKAY] [YES] .[OKAY] fused_adam cpu_adam............. transformer[YES]............... cpu_adam .................. [YES]............... [OKAY][YES][YES]...... ............[OKAY]fused_lamb [OKAY] [OKAY] ....... [OKAY] ......[YES] [OKAY]...... fused_adam[OKAY] transformer ............ [YES]sparse_attn .................. [OKAY][NO] ....... [OKAY]stochastic_transformer .............fused_adamstochastic_transformer [YES].............. ......[YES] [YES][OKAY]...... ............. [YES] ...... [OKAY]stochastic_transformer . [YES]transformer .................. [OKAY] ......[OKAY] fused_lamb [YES] ...... [OKAY] [OKAY]............. [YES] fused_lamb...... .............[OKAY] fused_adam fused_adam ............. . ............. [YES][YES][YES] .................. sparse_attn[OKAY] [OKAY]............ stochastic_transformer . [YES] ...... [OKAY] [YES] ...... [OKAY] [OKAY] fused_lamb [NO] fused_lamb ................................. [YES] [OKAY] [YES] ............ transformer[OKAY][OKAY] ............ [YES] ...... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [YES][YES] ............ [OKAY][OKAY] stochastic_transformer . sparse_attn[YES]sparse_attn .............................. [OKAY][NO] [NO]....... [OKAY]....... [OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- transformer ............transformer [YES] .................. [OKAY][YES] --------------------------------------------------op name ...... [OKAY] ................op name installed................ ..installed compatible.. stochastic_transformer . stochastic_transformer[YES] ....... [OKAY] --------------------------------------------------compatible -------------------------------------------------- [YES] ...... [OKAY] cpu_adam ............... ninjaninja[YES] .......................................... [OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op namecpu_adam op name ................ ...............fused_adam ................ installed [YES] ............. installed.. ...... [YES] compatible .. ...... op name ninja................ installed.................. [OKAY].. --------------------------------------------------compatible op name-------------------------------------------------- [OKAY] -------------------------------------------------- compatible[OKAY] --------------------------------------------------fused_lamb ninja................ installed .................... cpu_adam[OKAY]compatible ............... --------------------------------------------------ninja --------------------------------------------------[YES] ........................ op name [OKAY] [OKAY] ............. [YES] fused_adam......cpu_adam cpu_adam.............[OKAY]............... cpu_adam................-------------------------------------------------- ninja .................. [OKAY]ninja --------------------------------------------------.................. ...............[YES] [YES]...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] installed op namefused_adam ............... ................ .. [YES].............installed compatible ..[YES] ...... compatible......-------------------------------------------------- -------------------------------------------------- [OKAY]op name ................ --------------------------------------------------installed ..op name compatible................ --------------------------------------------------installed ninja.. compatible.................. [OKAY]--------------------------------------------------cpu_adam ...............-------------------------------------------------- sparse_attn ............fused_adam fused_lambfused_adam [NO]............. ................................. [YES] [YES][YES][OKAY] [OKAY] [OKAY] [YES] ninjaop name...... ................cpu_adam [OKAY]..................installed [OKAY]............... .. --------------------------------------------------compatible[YES] .................. transformer[OKAY][OKAY][OKAY] ............ cpu_adam ............... fused_lambcpu_adam[YES] ............. fused_adam............... ......[YES] ...... .............[OKAY][YES][OKAY] [YES] fused_adam op name -------------------------------------------------- ...... ............................. [OKAY][YES]installed [YES]fused_lamb fused_lamb ...... ............. .............[OKAY] [YES] ...... [OKAY] ...... [OKAY] ..cpu_adam...... ...............compatible [OKAY][YES]fused_adam -------------------------------------------------- [YES] ............ stochastic_transformer[OKAY][OKAY] sparse_attn fused_adam fused_lamb............. sparse_attn ............. [YES]............fused_adam [NO]................... [YES] [OKAY][YES] ....... ...... fused_lamb......[OKAY][OKAY] ......fused_lamb ............. ............. [OKAY] [YES] [YES] cpu_adam...... ...... ............... [OKAY] [OKAY] [YES] . ............[YES] ...... [OKAY][NO] ....... [OKAY] [OKAY]............. fused_adam fused_lamb ...... ............. [OKAY].............[YES] sparse_attntransformersparse_attn ........................ [NO][NO] .......................... [YES] [OKAY][OKAY] transformer[YES] .................. fused_lamb [OKAY] [YES] ............. ......[YES] ......[OKAY] sparse_attn[OKAY] ......[YES] sparse_attn[OKAY]...... transformer......transformer ........................ [YES][YES][OKAY] fused_adam............[OKAY] ............ [OKAY][OKAY] ............ stochastic_transformersparse_attn[NO] .................... [YES][NO] ...... .......sparse_attn[OKAY][OKAY] fused_lamb.............[NO] .............[YES]....... [YES]...... [OKAY] ...... stochastic_transformer . stochastic_transformerstochastic_transformer[YES] ........ [YES][YES] ............ [OKAY] [OKAY] [OKAY] ............[OKAY] [NO] .......transformer [OKAY]transformer............ [OKAY] sparse_attntransformer[OKAY] [YES]............ [YES]transformer ........................ [YES][OKAY][OKAY] fused_lamb........................ .............[NO][YES] [YES] ................... [OKAY][OKAY] ...... [OKAY] [OKAY]sparse_attn stochastic_transformer . stochastic_transformer[YES]stochastic_transformer ....... . [OKAY] [YES] [YES] ............ [OKAY][OKAY] ............ transformer[NO] stochastic_transformer ................... . [OKAY] [YES] sparse_attn[YES] ........................transformer [OKAY] [NO] ............[OKAY] .......stochastic_transformer[YES] [OKAY]....... [OKAY][YES] transformer ..................stochastic_transformer [OKAY].[YES] [YES]...... [OKAY]...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible ninja-------------------------------------------------- ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- ninjacpu_adam -------------------------------------------------- op name................................. op name ................ [OKAY][YES] ................ installed ...... installed--------------------------------------------------..[OKAY] ..op namecompatible compatible................-------------------------------------------------- installed--------------------------------------------------fused_adam ............... compatible[YES] ......--------------------------------------------------cpu_adam [OKAY] cpu_adam............... ...............[YES]fused_lamb [YES]cpu_adam...... ............. ............... [OKAY][YES]...... [YES][OKAY]...... ......[OKAY] [OKAY]fused_adam ............. [YES]fused_adam ...... .............[OKAY]fused_adam [YES]sparse_attn.............fused_lamb ............[YES]................... [NO]......[YES] [OKAY] [OKAY]...... ....... fused_lamb[OKAY][OKAY]fused_lamb .......................... transformer [YES] [YES]............ ...... ......[YES][OKAY] sparse_attn [OKAY] ...... ............ [OKAY][NO] ....... stochastic_transformer[OKAY] . [YES]sparse_attn transformer ......sparse_attn ............ ............ ............[OKAY][NO] [YES] ....... [NO] ...... [OKAY] ....... [OKAY] [OKAY] transformer ............stochastic_transformer transformer [YES] ................... [YES][OKAY][YES] ............ [OKAY][OKAY] stochastic_transformer . [YES] stochastic_transformer...... [OKAY]. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] ninjasparse_attn .............................. [NO] ninjaninja [OKAY]....... ..................[OKAY].................. -------------------------------------------------- [OKAY] [OKAY] op nametransformer-------------------------------------------------- -------------------------------------------------- ............................ op name [YES]installed op name ................ ........................ [OKAY]installedcompatible installed ..--------------------------------------------------.. stochastic_transformer compatible compatible . ----------------------------------------------------------------------------------------------------[YES] ......cpu_adam [OKAY] ............... cpu_adam[YES]cpu_adam .................................... [YES][OKAY][YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [YES]fused_adam............. ...................[YES] [OKAY][YES]...... ......[OKAY] fused_lamb [OKAY] ............. fused_lamb[YES] ...................fused_lamb [YES]............. [OKAY] ......[YES] [OKAY]...... [OKAY] sparse_attn ............ [NO]sparse_attn .......sparse_attn............ [OKAY]............[NO] [NO]....... .......[OKAY] transformer[OKAY] ............ [YES] transformer...... transformer ............ [OKAY] ............ [YES] [YES]...... ......[OKAY]stochastic_transformer [OKAY]. [YES] ......stochastic_transformer stochastic_transformer [OKAY] . . [YES][YES] ............ [OKAY][OKAY] ninja .................. [OKAY] --------------------------------------------------ninja op name.................. ninja ninja[OKAY]................ .................. ..................installed -------------------------------------------------- [OKAY] ..[OKAY] op name compatible -------------------------------------------------- ................-------------------------------------------------- -------------------------------------------------- installedop nameop name .................................. compatibleinstalledinstalled ..--------------------------------------------------.. cpu_adamcompatiblecompatible ............... ----------------------------------------------------------------------------------------------------[YES]cpu_adam ..................... [OKAY][YES] cpu_adamcpu_adam...... ..............................[OKAY] [YES][YES] ......fused_adam...... [OKAY].............[OKAY]fused_adam [YES]............. ......[YES] [OKAY]fused_adam fused_adam ...... ............. ............. [OKAY]fused_lamb[YES] .............[YES]......fused_lamb [YES] ......[OKAY]............. [OKAY] ...... [YES] fused_lamb[OKAY] fused_lamb ............. ...... .............[YES][OKAY] [YES]...... ......[OKAY] [OKAY] sparse_attn ............ [NO] ....... sparse_attn[OKAY] sparse_attn ............sparse_attn ............ transformer [NO][NO]............ ............ ....... .......[NO] [OKAY][YES][OKAY] ............. transformertransformer [OKAY] ............[OKAY] ............ [YES][YES] transformer ...... ......stochastic_transformer ............ [OKAY] . [OKAY] [YES] [YES] stochastic_transformer............stochastic_transformer .[OKAY][OKAY] . [YES] [YES]...... stochastic_transformer ...... [OKAY] . [OKAY] [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninja .................. [OKAY] --------------------------------------------------ninja op name.................. ................[OKAY] installed .. --------------------------------------------------compatible op name-------------------------------------------------- ................ninja installed .................... ninja compatiblecpu_adam [OKAY] -------------------------------------------------- ............... ..................--------------------------------------------------[YES] [OKAY]......op name [OKAY]................ -------------------------------------------------- installedcpu_adam op name .. ............... ................ compatible fused_adam installed[YES]-------------------------------------------------- ............. ...... ..[YES][OKAY] ......compatible [OKAY]cpu_adam -------------------------------------------------- ............... [YES]fused_lamb ................... fused_adam [OKAY][YES] cpu_adam ............. ...... ...............[YES][OKAY]fused_adam [YES]............. ............[YES] [OKAY][OKAY]...... [OKAY] sparse_attnfused_lamb fused_lamb ............fused_adam .............[NO].......................... [YES] .......[YES] [YES] [OKAY]...... ...... [OKAY]...... [OKAY] transformer [OKAY]............ fused_lamb[YES] ................... [OKAY] [YES]sparse_attn stochastic_transformer.................. . sparse_attn[OKAY] [NO] [YES]................... ...... [NO] [OKAY] [OKAY] ....... [OKAY] sparse_attntransformer transformer ........................ [NO][YES] ......................... [OKAY][OKAY][YES] ...... [OKAY]stochastic_transformer . transformer[YES] ............stochastic_transformer...... [YES].[OKAY] ...... [YES] [OKAY]...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- ninjaninjacpu_adam cpu_adam .................. ................................................ [OKAY][OKAY][YES] [YES] --------------------------------------------------......--------------------------------------------------...... op name [OKAY] [OKAY] op name................ ................installed installed.. ..compatible fused_adamcompatiblefused_adam-------------------------------------------------- .............--------------------------------------------------............. [YES][YES] ............ cpu_adam[OKAY][OKAY] cpu_adam............... fused_lamb...............[YES] fused_lamb .............[YES]............. ...... ......[YES][YES][OKAY] [OKAY] ............ [OKAY][OKAY] fused_adam .............fused_adam [YES]............. ......[YES] ......[OKAY] sparse_attn[OKAY]sparse_attn ............ fused_lamb............[NO] fused_lamb....... .............[NO]............. [OKAY] [YES][YES]....... ...... [OKAY]......transformer [OKAY] [OKAY] ............transformer [YES]............ ......[YES] [OKAY]...... [OKAY] stochastic_transformersparse_attn stochastic_transformer .sparse_attn............. [YES] ............[YES] [NO] ...... ......[NO] .......[OKAY][OKAY] ....... [OKAY] [OKAY] transformer ............transformer [YES]............ ......[YES] [OKAY]...... [OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ......ninja [OKAY]ninja .................. fused_lamb..................[OKAY] [OKAY]............. ----------------------------------------------------------------------------------------------------[YES] ......op nameop name [OKAY]................................ ninja installedinstalled ...................... [OKAY]compatiblecompatible --------------------------------------------------sparse_attn---------------------------------------------------------------------------------------------------- ............ op name[NO] ....................... cpu_adaminstalled cpu_adam[OKAY].. ............... ............... transformercompatible [YES] [YES] ............ -------------------------------------------------- ...... ......[YES] [OKAY][OKAY] ...... [OKAY] cpu_adam ............... stochastic_transformer[YES]fused_adamfused_adam .................... ............. [YES][YES][OKAY] [YES]............ [OKAY]......[OKAY] [OKAY]fused_lamb fused_adam .......................... fused_lamb [YES] [YES] ............. ...... ...... [YES] [OKAY] [OKAY] ...... [OKAY]fused_lamb ............. [YES] ...... [OKAY]sparse_attn ............ [NO] sparse_attn....... ............[OKAY] [NO] ....... transformer[OKAY] sparse_attn............ ............transformer[YES] [NO].................. .......[YES] [OKAY] [OKAY] ...... [OKAY] transformerstochastic_transformer stochastic_transformer............. [YES][YES]. ............[YES] [OKAY][OKAY] ...... [OKAY]stochastic_transformer . [YES] ...... [OKAY] ninja .................. ninja[OKAY] .................. --------------------------------------------------[OKAY] op name-------------------------------------------------- ................ op nameinstalled .................. installedcompatible ..-------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... [OKAY][YES] ......ninjaninja [OKAY].................. fused_adam..................[OKAY] ............. [OKAY][YES] -------------------------------------------------- ......fused_adam --------------------------------------------------.............op name [OKAY] op name................[YES] fused_lamb................installed...... .............installed[OKAY].. [YES] .. compatible ...... fused_lamb compatible [OKAY].............-------------------------------------------------- [YES]-------------------------------------------------- ...... [OKAY] cpu_adam ............... [YES] sparse_attn...... cpu_adam............ [OKAY][NO]............... sparse_attn ....... [YES]............ [OKAY][NO]...... fused_adam [OKAY]....................transformer ............[OKAY][YES] [YES]fused_adam ...... transformer...... .............[OKAY][OKAY]............ [YES][YES] fused_lamb ............stochastic_transformer ............. .[OKAY][OKAY] [YES] [YES]...... stochastic_transformerfused_lamb...... [OKAY][OKAY].............. [YES][YES] ............ [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [YES]....... ......[OKAY] [OKAY] transformer ............stochastic_transformer [YES]. ......[YES] [OKAY]...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninjaop name .................................. [OKAY]installed --------------------------------------------------.. compatibleop name --------------------------------------------------................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY]fused_adam ............. [YES] ...... [OKAY] fused_adamfused_lamb ............. .............[YES] [YES]...... [OKAY]...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attntransformer ........................ [NO] [YES]....... ......[OKAY] [OKAY] transformer stochastic_transformer............ [YES]. ...... [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ...............async_io [NO] ............... [NO] ....... .......[NO] [NO] transformer_inferencetransformer_inference .. [NO] ......... [OKAY][NO] ....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] .......  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[NO] transformer_inference .. async_io[NO] ...................... [NO][OKAY] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info:['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch install path torch cuda version............... ............... 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... .................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']1.8.2 deepspeed info torch cuda version................... ...............0.5.5+58a8e13, 58a8e13, master 11.1 nvcc version .....................deepspeed wheel compiled w. 11.2...... deepspeed install pathtorch 1.8, cuda 11.1 ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found................ [NO] ....... [NO] async_io transformer_inference............... ..[NO] [NO]....... .......[NO] [OKAY] utils .................. [YES] ...... transformer_inference[OKAY] .. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ............... [NO] ....... [NO] transformer_inferenceasync_io ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO]quantizer ..................... [NO][OKAY] ....... [OKAY] utils ..................-------------------------------------------------- [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] DeepSpeed general environment info: deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.2 1.8.2 torch cuda version torch cuda version............... ...............11.1 11.1nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ...........['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ................... deepspeed info0.5.5+58a8e13, 58a8e13, master ................... deepspeed wheel compiled w.0.5.5+58a8e13, 58a8e13, master ...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info:deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info torch install path................... 0.5.5+58a8e13, 58a8e13, master............... deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch install pathtorch version ................................... 1.8.2 torch cuda version ...............['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] 11.1 nvcc versiontorch version ......................................... 11.21.8.2 deepspeed install path ...........torch cuda version ...............['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 11.1deepspeed info nvcc version................... .....................0.5.5+58a8e13, 58a8e13, master 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] utils .................. [YES] ...... [OKAY] torch version .................... 1.8.2 torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ...............  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version async_io............... 11.1............... nvcc version [NO]..................... ....... [NO] 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']transformer_inference ..deepspeed info [NO]................... 0.5.5+58a8e13, 58a8e13, master....... deepspeed wheel compiled w.[OKAY] ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utils utils.................. .................. [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer ..............  [WARNING]  async_io: please install the libaio-devel package with yum[NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'].................... 1.8.2 torch version torch cuda version.................... ...............1.8.2 11.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.5.5+58a8e13, 58a8e13, master['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... DeepSpeed general environment info: ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch install path ................................... 1.8.2 torch cuda version ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']11.1 nvcc version torch version..................... ....................11.2 1.8.2deepspeed install path ...........torch cuda version ...............['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 11.1deepspeed info nvcc version................... .....................0.5.5+58a8e13, 58a8e13, master 11.2deepspeed wheel compiled w. deepspeed install path...... ...........torch 1.8, cuda 11.1 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.2132781.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... > setting tensorboard ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-23 09:38:37,957] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.328 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.158 seconds time to initialize megatron (seconds): 37.444 [after megatron is initialized] datetime: 2021-11-23 09:39:01 building GPT model ... [2021-11-23 09:39:01,491] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-23 09:39:01,492] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-23 09:39:01,492] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.56 GB, percent = 20.6% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-23 09:39:02,790] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-23 09:39:03,322] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-23 09:39:03,323] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-23 09:39:03,323] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.78 GB, percent = 20.7% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-23 09:39:03,343] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-23 09:39:03,658] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-23 09:39:03,658] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-23 09:39:03,658] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-23 09:39:03,662] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-23 09:39:03,662] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-23 09:39:03,662] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-23 09:39:03,662] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-23 09:39:03,662] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-23 09:39:03,662] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-23 09:39:03,662] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-23 09:39:05,241] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-23 09:39:05,242] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-23 09:39:05,242] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.56 GB, percent = 21.7% [2021-11-23 09:39:05,278] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-23 09:39:05,279] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-23 09:39:05,279] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.71 GB, percent = 21.7% [2021-11-23 09:39:05,279] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-23 09:39:05,305] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-23 09:39:05,306] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-23 09:39:05,306] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.72 GB, percent = 21.7% [2021-11-23 09:39:05,306] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-23 09:39:05,306] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-23 09:39:05,306] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-23 09:39:05,306] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-23 09:39:05,306] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] amp_params ................... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] dump_state ................... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-23 09:39:05,307] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] pld_params ................... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-23 09:39:05,308] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-23 09:39:05,309] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-23 09:39:05,309] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-23 09:39:05,309] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-23 09:39:05,338] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-23 09:39:05,338] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 57 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 34successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 42 successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 43 successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 52successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 14 successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 30 successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 1 successfully loaded 32 ZeRO state_dicts for rank 22successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 3 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 18 successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 39successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 38 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 1 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 26 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 55 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 37526 time (ms) | load-checkpoint: 11514.96 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-23 09:39:16 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.050237 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.103 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.251 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.074 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-23 09:39:22 done with setup ... training ... time (ms) | model-and-optimizer-setup: 15442.35 | train/valid/test-data-iterators-setup: 5466.43 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.208598528 billion Number of parameters without embeddings: 1.20860672 billion [before the start of training step] datetime: 2021-11-23 09:39:22 [2021-11-23 09:39:22,407] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-23 09:39:22,407] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-23 09:39:22,407] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-23 09:39:22,407] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-23 09:39:22,407] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 32] (after 37600 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 iteration 37600/ 152972 | consumed samples: 14171584 | consumed tokens: 29023404032 | elapsed time per iteration (ms): 4675.0 | learning rate: 1.834E-04 | global batch size: 512 | lm loss: 1.584179E+00 | loss scale: 65536.0 | grad norm: 6768.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 0] (after 37600 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 iteration 37800/ 152972 | consumed samples: 14273984 | consumed tokens: 29233119232 | elapsed time per iteration (ms): 4667.1 | learning rate: 1.832E-04 | global batch size: 512 | lm loss: 1.483452E+00 | loss scale: 131072.0 | grad norm: 16345.846 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 10:16:13,017] [INFO] [logging.py:68:log_dist] [Rank 0] step=38000, skipped=77, lr=[0.0001829196661766512, 0.0001829196661766512], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 38000 loss: 1.4246 iter time (s): 0.002 samples/sec: 220949.716 iteration 38000/ 152972 | consumed samples: 14376384 | consumed tokens: 29442834432 | elapsed time per iteration (ms): 4657.2 | learning rate: 1.829E-04 | global batch size: 512 | lm loss: 1.508213E+00 | loss scale: 65536.0 | grad norm: 6444.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 38000 | lm loss value: 1.509573E+00 | lm loss PPL: 4.524797E+00 | ------------------------------------------------------------------------------------------- iteration 38200/ 152972 | consumed samples: 14478784 | consumed tokens: 29652549632 | elapsed time per iteration (ms): 5218.4 | learning rate: 1.827E-04 | global batch size: 512 | lm loss: 1.516392E+00 | loss scale: 65536.0 | grad norm: 6717.720 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 38400/ 152972 | consumed samples: 14581184 | consumed tokens: 29862264832 | elapsed time per iteration (ms): 4647.7 | learning rate: 1.824E-04 | global batch size: 512 | lm loss: 1.546970E+00 | loss scale: 131072.0 | grad norm: 16415.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 38600/ 152972 | consumed samples: 14683584 | consumed tokens: 30071980032 | elapsed time per iteration (ms): 4648.8 | learning rate: 1.822E-04 | global batch size: 512 | lm loss: 1.555103E+00 | loss scale: 131072.0 | grad norm: 14694.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 38800/ 152972 | consumed samples: 14785984 | consumed tokens: 30281695232 | elapsed time per iteration (ms): 4693.2 | learning rate: 1.820E-04 | global batch size: 512 | lm loss: 1.543373E+00 | loss scale: 131072.0 | grad norm: 19423.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39000/ 152972 | consumed samples: 14888384 | consumed tokens: 30491410432 | elapsed time per iteration (ms): 4651.4 | learning rate: 1.817E-04 | global batch size: 512 | lm loss: 1.523306E+00 | loss scale: 131072.0 | grad norm: 10010.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 39000 | lm loss value: 1.576859E+00 | lm loss PPL: 4.839732E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 11:37:48,221] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/mp_rank_00_model_states.pt [2021-11-23 11:37:48,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 11:37:48,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 11:37:48,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step39000/zero_pp_rank_26_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 39000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2697.32 iteration 39200/ 152972 | consumed samples: 14990784 | consumed tokens: 30701125632 | elapsed time per iteration (ms): 5273.9 | learning rate: 1.815E-04 | global batch size: 512 | lm loss: 1.571363E+00 | loss scale: 262144.0 | grad norm: 30247.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39400/ 152972 | consumed samples: 15093184 | consumed tokens: 30910840832 | elapsed time per iteration (ms): 4639.6 | learning rate: 1.812E-04 | global batch size: 512 | lm loss: 1.551480E+00 | loss scale: 131072.0 | grad norm: 13159.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39600/ 152972 | consumed samples: 15195584 | consumed tokens: 31120556032 | elapsed time per iteration (ms): 4639.9 | learning rate: 1.810E-04 | global batch size: 512 | lm loss: 1.567144E+00 | loss scale: 65536.0 | grad norm: 5464.160 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 39800/ 152972 | consumed samples: 15297984 | consumed tokens: 31330271232 | elapsed time per iteration (ms): 4638.1 | learning rate: 1.807E-04 | global batch size: 512 | lm loss: 1.553999E+00 | loss scale: 65536.0 | grad norm: 6098.438 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 12:55:11,737] [INFO] [logging.py:68:log_dist] [Rank 0] step=40000, skipped=82, lr=[0.00018045824883265537, 0.00018045824883265537], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 40000 loss: 1.6193 iter time (s): 0.002 samples/sec: 218750.649 iteration 40000/ 152972 | consumed samples: 15400384 | consumed tokens: 31539986432 | elapsed time per iteration (ms): 4642.5 | learning rate: 1.805E-04 | global batch size: 512 | lm loss: 1.584764E+00 | loss scale: 131072.0 | grad norm: 13925.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 40000 | lm loss value: 1.557079E+00 | lm loss PPL: 4.744941E+00 | ------------------------------------------------------------------------------------------- iteration 40200/ 152972 | consumed samples: 15502784 | consumed tokens: 31749701632 | elapsed time per iteration (ms): 5222.7 | learning rate: 1.802E-04 | global batch size: 512 | lm loss: 1.568215E+00 | loss scale: 131072.0 | grad norm: 14038.354 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 40400/ 152972 | consumed samples: 15605184 | consumed tokens: 31959416832 | elapsed time per iteration (ms): 4655.7 | learning rate: 1.799E-04 | global batch size: 512 | lm loss: 1.553756E+00 | loss scale: 131072.0 | grad norm: 16147.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 40500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 13:35:55,605] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/mp_rank_00_model_states.pt [2021-11-23 13:35:56,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,038] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,042] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,044] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,046] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,053] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,055] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,063] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,064] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,064] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,064] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,066] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,068] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,073] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,073] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,073] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,093] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,095] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,099] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,099] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 13:35:56,102] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 13:35:56,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step40500/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 40500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2746.35 iteration 40600/ 152972 | consumed samples: 15707584 | consumed tokens: 32169132032 | elapsed time per iteration (ms): 4699.0 | learning rate: 1.797E-04 | global batch size: 512 | lm loss: 1.532249E+00 | loss scale: 262144.0 | grad norm: 29909.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 40800/ 152972 | consumed samples: 15809984 | consumed tokens: 32378847232 | elapsed time per iteration (ms): 4644.8 | learning rate: 1.794E-04 | global batch size: 512 | lm loss: 1.574449E+00 | loss scale: 262144.0 | grad norm: 35701.581 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 41000/ 152972 | consumed samples: 15912384 | consumed tokens: 32588562432 | elapsed time per iteration (ms): 4658.8 | learning rate: 1.792E-04 | global batch size: 512 | lm loss: 1.570280E+00 | loss scale: 131072.0 | grad norm: 13777.922 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 41000 | lm loss value: 1.520657E+00 | lm loss PPL: 4.575229E+00 | ------------------------------------------------------------------------------------------- iteration 41200/ 152972 | consumed samples: 16014784 | consumed tokens: 32798277632 | elapsed time per iteration (ms): 5203.4 | learning rate: 1.789E-04 | global batch size: 512 | lm loss: 1.577696E+00 | loss scale: 131072.0 | grad norm: 9975.263 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 41400/ 152972 | consumed samples: 16117184 | consumed tokens: 33007992832 | elapsed time per iteration (ms): 4646.9 | learning rate: 1.786E-04 | global batch size: 512 | lm loss: 1.555496E+00 | loss scale: 262144.0 | grad norm: 39534.374 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 41600/ 152972 | consumed samples: 16219584 | consumed tokens: 33217708032 | elapsed time per iteration (ms): 4651.0 | learning rate: 1.784E-04 | global batch size: 512 | lm loss: 1.564247E+00 | loss scale: 262144.0 | grad norm: 15746.886 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 41800/ 152972 | consumed samples: 16321984 | consumed tokens: 33427423232 | elapsed time per iteration (ms): 4679.3 | learning rate: 1.781E-04 | global batch size: 512 | lm loss: 1.556884E+00 | loss scale: 262144.0 | grad norm: 28084.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 15:34:14,187] [INFO] [logging.py:68:log_dist] [Rank 0] step=42000, skipped=86, lr=[0.00017784993848792957, 0.00017784993848792957], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 42000 loss: 2.6731 iter time (s): 0.002 samples/sec: 220305.234 iteration 42000/ 152972 | consumed samples: 16424384 | consumed tokens: 33637138432 | elapsed time per iteration (ms): 4650.8 | learning rate: 1.778E-04 | global batch size: 512 | lm loss: 1.532468E+00 | loss scale: 262144.0 | grad norm: 41832.180 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 42000 | lm loss value: 1.505113E+00 | lm loss PPL: 4.504662E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 42000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 15:36:06,937] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/mp_rank_00_model_states.pt [2021-11-23 15:36:07,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,366] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,376] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 15:36:07,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 15:36:07,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step42000/zero_pp_rank_28_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 42000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2803.66 iteration 42200/ 152972 | consumed samples: 16526784 | consumed tokens: 33846853632 | elapsed time per iteration (ms): 5224.5 | learning rate: 1.776E-04 | global batch size: 512 | lm loss: 1.517369E+00 | loss scale: 131072.0 | grad norm: 15229.408 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 42400/ 152972 | consumed samples: 16629184 | consumed tokens: 34056568832 | elapsed time per iteration (ms): 4712.4 | learning rate: 1.773E-04 | global batch size: 512 | lm loss: 1.562072E+00 | loss scale: 65536.0 | grad norm: 5440.631 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 42600/ 152972 | consumed samples: 16731584 | consumed tokens: 34266284032 | elapsed time per iteration (ms): 4678.4 | learning rate: 1.770E-04 | global batch size: 512 | lm loss: 1.584806E+00 | loss scale: 65536.0 | grad norm: 7879.533 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 42800/ 152972 | consumed samples: 16833984 | consumed tokens: 34475999232 | elapsed time per iteration (ms): 4649.0 | learning rate: 1.768E-04 | global batch size: 512 | lm loss: 1.517342E+00 | loss scale: 65536.0 | grad norm: 6180.782 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 43000/ 152972 | consumed samples: 16936384 | consumed tokens: 34685714432 | elapsed time per iteration (ms): 4664.1 | learning rate: 1.765E-04 | global batch size: 512 | lm loss: 1.583167E+00 | loss scale: 131072.0 | grad norm: 18160.095 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 43000 | lm loss value: 1.514982E+00 | lm loss PPL: 4.549338E+00 | ------------------------------------------------------------------------------------------- iteration 43200/ 152972 | consumed samples: 17038784 | consumed tokens: 34895429632 | elapsed time per iteration (ms): 5395.5 | learning rate: 1.762E-04 | global batch size: 512 | lm loss: 1.581654E+00 | loss scale: 131072.0 | grad norm: 9401.730 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 43400/ 152972 | consumed samples: 17141184 | consumed tokens: 35105144832 | elapsed time per iteration (ms): 5538.1 | learning rate: 1.759E-04 | global batch size: 512 | lm loss: 1.560742E+00 | loss scale: 131072.0 | grad norm: 14137.880 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 43500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 17:38:21,816] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/mp_rank_00_model_states.pt [2021-11-23 17:38:22,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,257] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,321] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 17:38:22,366] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 17:38:22,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step43500/zero_pp_rank_13_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 43500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 3107.99 iteration 43600/ 152972 | consumed samples: 17243584 | consumed tokens: 35314860032 | elapsed time per iteration (ms): 4944.9 | learning rate: 1.757E-04 | global batch size: 512 | lm loss: 1.583980E+00 | loss scale: 131072.0 | grad norm: 10064.159 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 43800/ 152972 | consumed samples: 17345984 | consumed tokens: 35524575232 | elapsed time per iteration (ms): 6329.4 | learning rate: 1.754E-04 | global batch size: 512 | lm loss: 1.560069E+00 | loss scale: 65536.0 | grad norm: 8600.703 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 18:23:28,954] [INFO] [logging.py:68:log_dist] [Rank 0] step=44000, skipped=90, lr=[0.0001751009678178521, 0.0001751009678178521], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 44000 loss: 0.9002 iter time (s): 0.002 samples/sec: 220786.829 iteration 44000/ 152972 | consumed samples: 17448384 | consumed tokens: 35734290432 | elapsed time per iteration (ms): 4637.5 | learning rate: 1.751E-04 | global batch size: 512 | lm loss: 1.482616E+00 | loss scale: 65536.0 | grad norm: 4410.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 44000 | lm loss value: 1.518874E+00 | lm loss PPL: 4.567080E+00 | ------------------------------------------------------------------------------------------- iteration 44200/ 152972 | consumed samples: 17550784 | consumed tokens: 35944005632 | elapsed time per iteration (ms): 5612.8 | learning rate: 1.748E-04 | global batch size: 512 | lm loss: 1.613770E+00 | loss scale: 32768.0 | grad norm: 3170.172 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 44400/ 152972 | consumed samples: 17653184 | consumed tokens: 36153720832 | elapsed time per iteration (ms): 4639.7 | learning rate: 1.745E-04 | global batch size: 512 | lm loss: 1.545649E+00 | loss scale: 32768.0 | grad norm: 4838.177 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 44600/ 152972 | consumed samples: 17755584 | consumed tokens: 36363436032 | elapsed time per iteration (ms): 4836.5 | learning rate: 1.742E-04 | global batch size: 512 | lm loss: 1.566421E+00 | loss scale: 65536.0 | grad norm: 7397.425 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 44800/ 152972 | consumed samples: 17857984 | consumed tokens: 36573151232 | elapsed time per iteration (ms): 4782.2 | learning rate: 1.740E-04 | global batch size: 512 | lm loss: 1.514187E+00 | loss scale: 65536.0 | grad norm: 8226.418 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 45000/ 152972 | consumed samples: 17960384 | consumed tokens: 36782866432 | elapsed time per iteration (ms): 4781.7 | learning rate: 1.737E-04 | global batch size: 512 | lm loss: 1.590702E+00 | loss scale: 65536.0 | grad norm: 4764.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 45000 | lm loss value: 1.561856E+00 | lm loss PPL: 4.767663E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 45000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 19:48:09,222] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/mp_rank_00_model_states.pt [2021-11-23 19:48:09,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 19:48:09,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 19:48:09,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step45000/zero_pp_rank_21_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 45000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2665.03 iteration 45200/ 152972 | consumed samples: 18062784 | consumed tokens: 36992581632 | elapsed time per iteration (ms): 7131.0 | learning rate: 1.734E-04 | global batch size: 512 | lm loss: 1.548332E+00 | loss scale: 131072.0 | grad norm: 11708.431 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 45400/ 152972 | consumed samples: 18165184 | consumed tokens: 37202296832 | elapsed time per iteration (ms): 8180.5 | learning rate: 1.731E-04 | global batch size: 512 | lm loss: 1.537413E+00 | loss scale: 131072.0 | grad norm: 15302.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 45600/ 152972 | consumed samples: 18267584 | consumed tokens: 37412012032 | elapsed time per iteration (ms): 8022.6 | learning rate: 1.728E-04 | global batch size: 512 | lm loss: 1.578823E+00 | loss scale: 262144.0 | grad norm: 25622.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 45800/ 152972 | consumed samples: 18369984 | consumed tokens: 37621727232 | elapsed time per iteration (ms): 7999.1 | learning rate: 1.725E-04 | global batch size: 512 | lm loss: 1.538459E+00 | loss scale: 131072.0 | grad norm: 10673.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-23 21:47:31,977] [INFO] [logging.py:68:log_dist] [Rank 0] step=46000, skipped=93, lr=[0.00017221516654323494, 0.00017221516654323494], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 46000/ 152972 | consumed samples: 18472384 | consumed tokens: 37831442432 | elapsed time per iteration (ms): 5229.1 | learning rate: 1.722E-04 | global batch size: 512 | lm loss: 1.563845E+00 | loss scale: 131072.0 | grad norm: 16059.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 46000 loss: 2.0135 iter time (s): 0.002 samples/sec: 221026.399 ------------------------------------------------------------------------------------------- valid loss at iteration 46000 | lm loss value: 1.502583E+00 | lm loss PPL: 4.493280E+00 | ------------------------------------------------------------------------------------------- iteration 46200/ 152972 | consumed samples: 18574784 | consumed tokens: 38041157632 | elapsed time per iteration (ms): 5294.4 | learning rate: 1.719E-04 | global batch size: 512 | lm loss: 1.565707E+00 | loss scale: 65536.0 | grad norm: 4389.460 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 46400/ 152972 | consumed samples: 18677184 | consumed tokens: 38250872832 | elapsed time per iteration (ms): 4633.2 | learning rate: 1.716E-04 | global batch size: 512 | lm loss: 1.553213E+00 | loss scale: 65536.0 | grad norm: 5150.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 46500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-23 22:28:22,475] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/mp_rank_00_model_states.pt [2021-11-23 22:28:22,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,902] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,907] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-23 22:28:22,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-23 22:28:22,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step46500/zero_pp_rank_20_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 46500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2751.61 iteration 46600/ 152972 | consumed samples: 18779584 | consumed tokens: 38460588032 | elapsed time per iteration (ms): 4647.2 | learning rate: 1.713E-04 | global batch size: 512 | lm loss: 1.555159E+00 | loss scale: 131072.0 | grad norm: 18422.650 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 46800/ 152972 | consumed samples: 18881984 | consumed tokens: 38670303232 | elapsed time per iteration (ms): 4638.7 | learning rate: 1.710E-04 | global batch size: 512 | lm loss: 1.495719E+00 | loss scale: 65536.0 | grad norm: 5374.687 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 47000/ 152972 | consumed samples: 18984384 | consumed tokens: 38880018432 | elapsed time per iteration (ms): 4650.9 | learning rate: 1.707E-04 | global batch size: 512 | lm loss: 1.492057E+00 | loss scale: 65536.0 | grad norm: 5760.845 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 47000 | lm loss value: 1.577825E+00 | lm loss PPL: 4.844406E+00 | ------------------------------------------------------------------------------------------- iteration 47200/ 152972 | consumed samples: 19086784 | consumed tokens: 39089733632 | elapsed time per iteration (ms): 5357.7 | learning rate: 1.704E-04 | global batch size: 512 | lm loss: 1.504508E+00 | loss scale: 65536.0 | grad norm: 9514.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 47400/ 152972 | consumed samples: 19189184 | consumed tokens: 39299448832 | elapsed time per iteration (ms): 4652.6 | learning rate: 1.701E-04 | global batch size: 512 | lm loss: 1.550068E+00 | loss scale: 131072.0 | grad norm: 8766.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 47600/ 152972 | consumed samples: 19291584 | consumed tokens: 39509164032 | elapsed time per iteration (ms): 4654.1 | learning rate: 1.698E-04 | global batch size: 512 | lm loss: 1.511803E+00 | loss scale: 65536.0 | grad norm: 5103.932 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 47800/ 152972 | consumed samples: 19393984 | consumed tokens: 39718879232 | elapsed time per iteration (ms): 4641.9 | learning rate: 1.695E-04 | global batch size: 512 | lm loss: 1.561847E+00 | loss scale: 8192.0 | grad norm: 1614.940 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 00:26:56,144] [INFO] [logging.py:68:log_dist] [Rank 0] step=48000, skipped=101, lr=[0.00016920716267278184, 0.00016920716267278184], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 48000 loss: 1.7057 iter time (s): 0.002 samples/sec: 220126.890 iteration 48000/ 152972 | consumed samples: 19496384 | consumed tokens: 39928594432 | elapsed time per iteration (ms): 4650.1 | learning rate: 1.692E-04 | global batch size: 512 | lm loss: 1.540429E+00 | loss scale: 8192.0 | grad norm: 921.530 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 48000 | lm loss value: 1.517791E+00 | lm loss PPL: 4.562137E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 48000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 00:29:02,839] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/mp_rank_00_model_states.pt [2021-11-24 00:29:03,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 00:29:03,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 00:29:03,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step48000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 48000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2763.10 iteration 48200/ 152972 | consumed samples: 19598784 | consumed tokens: 40138309632 | elapsed time per iteration (ms): 5284.6 | learning rate: 1.689E-04 | global batch size: 512 | lm loss: 1.502381E+00 | loss scale: 16384.0 | grad norm: 1468.319 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 48400/ 152972 | consumed samples: 19701184 | consumed tokens: 40348024832 | elapsed time per iteration (ms): 4746.8 | learning rate: 1.686E-04 | global batch size: 512 | lm loss: 1.554941E+00 | loss scale: 16384.0 | grad norm: 1546.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 48600/ 152972 | consumed samples: 19803584 | consumed tokens: 40557740032 | elapsed time per iteration (ms): 4674.4 | learning rate: 1.683E-04 | global batch size: 512 | lm loss: 1.532146E+00 | loss scale: 16384.0 | grad norm: 2090.528 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 48800/ 152972 | consumed samples: 19905984 | consumed tokens: 40767455232 | elapsed time per iteration (ms): 4647.0 | learning rate: 1.680E-04 | global batch size: 512 | lm loss: 1.539840E+00 | loss scale: 32768.0 | grad norm: 2898.434 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 49000/ 152972 | consumed samples: 20008384 | consumed tokens: 40977170432 | elapsed time per iteration (ms): 4638.6 | learning rate: 1.677E-04 | global batch size: 512 | lm loss: 1.531885E+00 | loss scale: 32768.0 | grad norm: 4116.441 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 49000 | lm loss value: 1.537323E+00 | lm loss PPL: 4.652122E+00 | ------------------------------------------------------------------------------------------- iteration 49200/ 152972 | consumed samples: 20110784 | consumed tokens: 41186885632 | elapsed time per iteration (ms): 5246.8 | learning rate: 1.673E-04 | global batch size: 512 | lm loss: 1.523679E+00 | loss scale: 32768.0 | grad norm: 3011.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 49400/ 152972 | consumed samples: 20213184 | consumed tokens: 41396600832 | elapsed time per iteration (ms): 4639.2 | learning rate: 1.670E-04 | global batch size: 512 | lm loss: 1.518560E+00 | loss scale: 32768.0 | grad norm: 3710.171 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 49500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 02:27:37,766] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/mp_rank_00_model_states.pt [2021-11-24 02:27:38,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,228] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,249] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,253] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,253] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,253] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 02:27:38,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 02:27:38,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step49500/zero_pp_rank_5_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 49500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2734.45 iteration 49600/ 152972 | consumed samples: 20315584 | consumed tokens: 41606316032 | elapsed time per iteration (ms): 4663.8 | learning rate: 1.667E-04 | global batch size: 512 | lm loss: 1.552169E+00 | loss scale: 65536.0 | grad norm: 4346.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 49800/ 152972 | consumed samples: 20417984 | consumed tokens: 41816031232 | elapsed time per iteration (ms): 4670.0 | learning rate: 1.664E-04 | global batch size: 512 | lm loss: 1.530659E+00 | loss scale: 65536.0 | grad norm: 5131.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 03:06:30,272] [INFO] [logging.py:68:log_dist] [Rank 0] step=50000, skipped=102, lr=[0.00016606446756515646, 0.00016606446756515646], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 50000 loss: 1.8082 iter time (s): 0.002 samples/sec: 219598.157 iteration 50000/ 152972 | consumed samples: 20520384 | consumed tokens: 42025746432 | elapsed time per iteration (ms): 4659.5 | learning rate: 1.661E-04 | global batch size: 512 | lm loss: 1.552802E+00 | loss scale: 65536.0 | grad norm: 6984.988 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 50000 | lm loss value: 1.447613E+00 | lm loss PPL: 4.252950E+00 | ------------------------------------------------------------------------------------------- iteration 50200/ 152972 | consumed samples: 20622784 | consumed tokens: 42235461632 | elapsed time per iteration (ms): 5224.3 | learning rate: 1.657E-04 | global batch size: 512 | lm loss: 1.579614E+00 | loss scale: 131072.0 | grad norm: 14757.983 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 50400/ 152972 | consumed samples: 20725184 | consumed tokens: 42445176832 | elapsed time per iteration (ms): 4651.8 | learning rate: 1.654E-04 | global batch size: 512 | lm loss: 1.519649E+00 | loss scale: 131072.0 | grad norm: 14420.491 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 50600/ 152972 | consumed samples: 20827584 | consumed tokens: 42654892032 | elapsed time per iteration (ms): 4650.8 | learning rate: 1.651E-04 | global batch size: 512 | lm loss: 1.527410E+00 | loss scale: 262144.0 | grad norm: 29308.732 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 50800/ 152972 | consumed samples: 20929984 | consumed tokens: 42864607232 | elapsed time per iteration (ms): 4663.5 | learning rate: 1.648E-04 | global batch size: 512 | lm loss: 1.569419E+00 | loss scale: 262144.0 | grad norm: 25066.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 51000/ 152972 | consumed samples: 21032384 | consumed tokens: 43074322432 | elapsed time per iteration (ms): 4647.8 | learning rate: 1.645E-04 | global batch size: 512 | lm loss: 1.536307E+00 | loss scale: 131072.0 | grad norm: 10307.701 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 51000 | lm loss value: 1.499980E+00 | lm loss PPL: 4.481600E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 51000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 04:27:48,544] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/mp_rank_00_model_states.pt [2021-11-24 04:27:48,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,990] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,993] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 04:27:48,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 04:27:48,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,003] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,010] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,015] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,022] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,022] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,022] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,023] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,025] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,026] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,028] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 04:27:49,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,030] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,033] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 04:27:49,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51000/zero_pp_rank_27_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 51000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2694.98 iteration 51200/ 152972 | consumed samples: 21134784 | consumed tokens: 43284037632 | elapsed time per iteration (ms): 5209.6 | learning rate: 1.641E-04 | global batch size: 512 | lm loss: 1.538503E+00 | loss scale: 131072.0 | grad norm: 9527.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 51400/ 152972 | consumed samples: 21237184 | consumed tokens: 43493752832 | elapsed time per iteration (ms): 4649.1 | learning rate: 1.638E-04 | global batch size: 512 | lm loss: 1.548578E+00 | loss scale: 65536.0 | grad norm: 4509.272 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 51600/ 152972 | consumed samples: 21339584 | consumed tokens: 43703468032 | elapsed time per iteration (ms): 4667.1 | learning rate: 1.635E-04 | global batch size: 512 | lm loss: 1.522854E+00 | loss scale: 65536.0 | grad norm: 6601.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 51780 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 05:28:29,703] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/mp_rank_00_model_states.pt [2021-11-24 05:28:30,127] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,132] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,133] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,133] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,144] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,145] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,148] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,167] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 05:28:30,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,211] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 05:28:30,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step51780/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 51780 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2857.65 [exiting program after 1190.0619795163473 minutes] datetime: 2021-11-24 05:28:30 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja ...................................................... [OKAY]..................[OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name op name ................ op name ................................ installed ................ installedinstalled .. installed .... compatible compatible.. compatible -------------------------------------------------- --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... [YES]...............[YES]cpu_adam [YES]............ ............... [OKAY] ......[OKAY][YES] [OKAY]...... [OKAY]fused_adam .............fused_adam [YES]fused_adam............. ...................[YES] fused_adam [OKAY] [YES]...... ............. ......[OKAY][YES]fused_lamb [OKAY]................... fused_lamb[OKAY][YES] fused_lamb................... fused_lamb.............[YES][OKAY] ...................[YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ transformer sparse_attn[NO]sparse_attn............ ...............................[YES] [OKAY] [NO] [NO]...... .......[OKAY]....... transformer [OKAY] [OKAY] ............stochastic_transformer [YES]transformer. transformer ...... ............ ............[OKAY][YES] [YES]......[YES] ...... stochastic_transformer...... [OKAY] [OKAY][OKAY] . [YES] ......stochastic_transformer stochastic_transformer [OKAY] . . [YES][YES] ...... ......[OKAY] [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................ ................installed installed.. ..compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]...... ......[OKAY] [OKAY] fused_adam .............fused_adam [YES]............. ......[YES] [OKAY]...... [OKAY] fused_lamb ............. fused_lamb[YES] ................... [YES][OKAY] ...... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [YES][YES] ............ [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name ................................op nameop name installedinstalled................................ ....installedinstalled compatiblecompatible.... compatible----------------------------------------------------------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adamcpu_adam ...............[YES]............... ...............[YES] ...... [YES][YES]......[OKAY] ............[OKAY] [OKAY][OKAY] fused_adam ............. [YES] ...... [OKAY]fused_adam fused_adamfused_adam ....................................... fused_lamb [YES] [YES][YES]............. ..................[YES] [OKAY]......[OKAY][OKAY] [OKAY] fused_lamb .............fused_lambfused_lamb [YES] ............. ............. ...... [YES] [YES] [OKAY] ...... ...... sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] transformersparse_attn ........................ [YES][NO] sparse_attn sparse_attn...... ....... ............ [OKAY] ............[OKAY][NO] [NO]....... transformerstochastic_transformer.......[OKAY] ............[OKAY].transformer [YES][YES]............ transformer......[YES]...... [OKAY][OKAY] .................. [YES][OKAY] ...... stochastic_transformer[OKAY] ninjaninjaninjaninja .................................... ..................[OKAY] .stochastic_transformer [YES] stochastic_transformer....... [YES].[OKAY] ......[YES] ......[OKAY] [OKAY] ..................[OKAY][OKAY]-------------------------------------------------- [OKAY]----------------------------------------------------------------------------------------------------op name ................--------------------------------------------------op nameop name installed ................ op name .................. installed ................ installed ..compatible installed .. compatible-------------------------------------------------- .. compatible --------------------------------------------------compatible -------------------------------------------------- cpu_adam-------------------------------------------------- ............... [YES] cpu_adam...... ...............[OKAY]cpu_adam [YES]cpu_adam............... .....................[YES] fused_adam[OKAY][YES]...... .............[OKAY]...... [YES][OKAY] ......fused_adam [OKAY]............. [YES] fused_lamb......fused_adam fused_adam .............[OKAY] ............. [YES] ............. [YES] ......fused_lamb[YES]...... ...................[OKAY][OKAY] [YES] [OKAY]...... fused_lamb [OKAY]fused_lamb............. .............[YES] ......[YES] sparse_attn [OKAY] ...... ............ [OKAY][NO] sparse_attn....... ............[OKAY] [NO] ....... transformer[OKAY] ............ [YES]sparse_attn ......transformersparse_attn............ ............[NO]............[OKAY] [NO]....... [YES] ....... [OKAY] ......stochastic_transformer[OKAY] .[OKAY] [YES]transformertransformer ..............................stochastic_transformer [YES] [OKAY] [YES]....... ......[OKAY][YES] [OKAY]...... [OKAY] stochastic_transformerstochastic_transformer . [YES]. ...... [YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------op nameop nameop name op name................................................ installed installed................ installed.. ..installed.. compatible compatible..compatible --------------------------------------------------compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam ............... .............................. ............... [YES] [YES][YES][YES] .................. ...... [OKAY] [OKAY][OKAY] [OKAY] fused_adam fused_adam.............fused_adam fused_adam.............[YES]............. ...................[YES][YES] ......[OKAY][YES]...... [OKAY]fused_lamb ......[OKAY] fused_lamb.............[OKAY] .............[YES]fused_lamb ......fused_lamb[YES]............. [OKAY]............. ......[YES] [OKAY][YES]...... [OKAY]...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ sparse_attn sparse_attn....... [YES]............[OKAY] ............ ...... [NO][NO][OKAY]transformer .......................... [OKAY][OKAY]stochastic_transformer[YES] ....... transformer[YES][OKAY] transformer ............ ......[YES]............ stochastic_transformer......[OKAY] [YES] . [OKAY][YES]...... ......[OKAY] [OKAY] stochastic_transformer . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op name ................op name................ installed................................ installed .. installed.. installed compatible..compatible compatible--------------------------------------------------.. -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam ............... ......cpu_adam ............... [YES] [OKAY] ............... [YES]...... [YES][OKAY]...... ......fused_adam[OKAY] [OKAY]............. [YES] ......fused_adam [OKAY]............. fused_adam fused_adamfused_lamb [YES].......................... ............. ......[YES] [YES] [YES][OKAY] ...... ............[OKAY] [OKAY]fused_lamb[OKAY] ............. fused_lamb[YES]fused_lamb ................... ............. [YES] [OKAY] sparse_attn[YES] ...... ............ [OKAY]......[NO] [OKAY]....... [OKAY] sparse_attntransformer ........................ [NO][YES] sparse_attn............. [OKAY][OKAY]............sparse_attn [NO]............ transformer....... [NO]stochastic_transformer............ [OKAY].[YES] ....... [YES]......transformer[OKAY] ..................[OKAY] [OKAY]transformer [YES] ..................stochastic_transformer [YES].[OKAY] ......[YES] ......[OKAY]stochastic_transformer [OKAY]. stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ..................ninja.................. .................. ..................[OKAY] [OKAY][OKAY] --------------------------------------------------[OKAY]-------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op nameop name................ ................ ................ installedop name installed installed .................. .. compatible.. installed compatiblecompatible-------------------------------------------------- --------------------------------------------------.. -------------------------------------------------- compatible -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam ......cpu_adam............... ............... [OKAY][YES]............... ......[YES] [YES] [OKAY]...... ......[OKAY] fused_adam [OKAY] ............. [YES] fused_adam...... .............[OKAY] fused_adam [YES]fused_adam fused_lamb............. ...... .............[YES][OKAY]............. [YES]......[YES] ......fused_lamb...... [OKAY][OKAY] .............[OKAY] [YES] ......fused_lamb fused_lamb[OKAY]............. .............[YES] [YES]...... ......[OKAY] sparse_attn[OKAY] ............ [NO] .......sparse_attn [OKAY]............ [NO] transformer....... ............[OKAY]sparse_attn sparse_attntransformer [YES] ........................ ............ ......[NO][YES] [NO] [OKAY] ....... ......[OKAY]....... [OKAY] stochastic_transformer[OKAY]transformer ............. stochastic_transformertransformer[YES] [YES]................... ......[YES] [OKAY][YES] [OKAY] ...... ...... [OKAY] stochastic_transformer[OKAY] . [YES] stochastic_transformer...... .[OKAY] [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name ................op name................ ................installed ................installed installed.. installed .... compatible ..compatible compatible --------------------------------------------------compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam[YES].............................. [YES][YES]..................... ...... ...... [OKAY][YES] [OKAY] [OKAY]...... [OKAY] fused_adam ............. fused_adam[YES] .............fused_adam ......[YES]fused_adam ................... .............[OKAY] [YES][OKAY] fused_lamb[YES]...... fused_lamb...................[OKAY] [YES][OKAY] ............. fused_lamb......[YES] fused_lamb[OKAY]................... [OKAY].............[YES] [YES]...... ......[OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO]sparse_attn sparse_attn ....... ............ ............transformer [OKAY] [NO]............[NO] transformer ....... [YES]....... ............ [OKAY] ......[OKAY][YES] [OKAY]...... transformer[OKAY] transformer............stochastic_transformer .............[YES] stochastic_transformer [YES][YES] ....... ...... ...... [OKAY][YES] [OKAY] [OKAY]...... stochastic_transformer[OKAY] . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................ op name installedinstalled ................ .................. .. installed compatible installedcompatible..-------------------------------------------------- --------------------------------------------------..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... cpu_adam ......[YES] [OKAY]...............cpu_adam ...... [YES]...............[OKAY] ......[YES] [OKAY]......fused_adam [OKAY].............fused_adam [YES]............. fused_adam......[YES] ......[OKAY]............. fused_adam [OKAY] ............. [YES]fused_lamb [YES]......fused_lamb............. ...................[OKAY] [YES] [OKAY] [YES] fused_lamb ............ fused_lamb [OKAY] .............[OKAY]............. [YES][YES] ............ [OKAY][OKAY] sparse_attn ............ sparse_attn[NO] ............ .......[NO] [OKAY]....... sparse_attn[OKAY]sparse_attntransformer ........................ ............transformer [NO][NO][YES] ......................... .......[YES] [OKAY] [OKAY]......[OKAY] [OKAY]transformer stochastic_transformertransformer............ .............[YES]stochastic_transformer [YES][YES]....... [OKAY]............[YES] [OKAY]......[OKAY] stochastic_transformer [OKAY] . stochastic_transformer[YES] ....... [YES] [OKAY]...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name ................ ................ ................ ................installedinstalled ..installed..installed compatible..compatible .. --------------------------------------------------compatible--------------------------------------------------compatible ................op nameop name................ installed ................installed................ .. .. installed installedcompatible compatible .. ..-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- compatible compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam[YES]............... ....................................[YES] [YES][YES] ......[OKAY] ......[OKAY]...... [OKAY][OKAY] cpu_adam ............... [YES]cpu_adam ...... cpu_adamcpu_adam[OKAY]............... fused_adam ............. fused_adam[YES] fused_adamfused_adam ................... ............. ............. [YES][OKAY][YES] ...............[YES] ............... ...... [YES] [YES][OKAY] ...... [YES]...... [OKAY]fused_lamb......[OKAY] .............[OKAY] fused_adam............ ............. [OKAY] [OKAY] fused_adam[YES] [YES]fused_lamb fused_lamb...... fused_lamb ............. [OKAY] .......................... ................... [YES][OKAY] [YES] [YES][YES]...... ......[OKAY]...... [OKAY][OKAY] ......fused_adam [OKAY]fused_adam.............fused_lamb ..........................fused_lamb [YES] [YES] .............[YES] ...... ......[YES]...... [OKAY] [OKAY] sparse_attn ............ [NO] ....... [OKAY] ...... [OKAY] [OKAY]fused_lamb transformersparse_attn sparse_attn sparse_attn........................ ............ ............ [YES][NO] [NO] [NO].................... .......[OKAY] [OKAY][OKAY] [OKAY] ............. [YES] fused_lamb...... .............[OKAY] transformer transformer............stochastic_transformertransformer ............[YES]............. [YES] ......[YES] [YES] ......[OKAY] [OKAY]...... ...... [OKAY][OKAY] [YES] ...... [OKAY]sparse_attn stochastic_transformerstochastic_transformer .stochastic_transformer. [YES]. [YES] ...... [YES] ...... [OKAY] ...... [OKAY] [OKAY] ............ sparse_attn[NO] ................... [NO][OKAY] .......sparse_attn [OKAY] transformer sparse_attn ............ ............transformer............ [YES]............[NO] [NO] ....... ...... [YES] .......[OKAY] [OKAY] ...... [OKAY] transformer [OKAY]stochastic_transformer............ . [YES][YES]transformer stochastic_transformer...... ...... ............. [OKAY][YES][OKAY][YES] ............stochastic_transformer [OKAY][OKAY]. [YES] ......stochastic_transformer [OKAY]. [YES] ...... [OKAY] ninjaninjaninja .................. ninja.................................... [OKAY] [OKAY]-------------------------------------------------- ..................[OKAY] -------------------------------------------------- op name[OKAY] --------------------------------------------------op name ................ -------------------------------------------------- ................ op nameinstalledinstalled ................op name .. ..installed ................compatible compatible.. installed --------------------------------------------------compatible.. compatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. ...............[YES][YES] [YES]............ [OKAY]cpu_adam .....................[OKAY] [OKAY] fused_adam[YES] ......fused_adam ............. fused_adam[OKAY] ............. [YES] .............[YES]...... ......[YES][OKAY] [OKAY] ...... [OKAY]fused_lamb fused_lamb ............. fused_lamb[YES]............. fused_adam............. [YES]......[YES] ...................[OKAY]...... [YES][OKAY][OKAY] ...... [OKAY] sparse_attn ............ [NO]fused_lamb .......sparse_attnsparse_attn ............[OKAY]............ ............. [NO] [NO] transformer [YES].............. ...... [OKAY]............ [OKAY] [OKAY] [YES] transformer...... transformer ............ [OKAY] ............ [YES] [YES]...... stochastic_transformer ...... [OKAY] . [OKAY]sparse_attn [YES] stochastic_transformer stochastic_transformer...... .............. [OKAY] [YES] [NO][YES] ...... ...... [OKAY] ....... [OKAY][OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name ................op name op name................ installed ................installed ................ ..installed .. compatibleinstalled.. --------------------------------------------------compatible..compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam ...............[OKAY] ............... ...............[YES][YES] ............[YES] fused_adam[OKAY] [OKAY] ................... [YES][OKAY] ......fused_adam fused_adam [OKAY] ............. ............. fused_adam [YES] [YES]fused_lamb ................... ................... [OKAY] [YES][YES][OKAY]fused_lamb ......................... [OKAY][OKAY]fused_lamb[YES] ................... fused_lamb[OKAY][YES] ................... [YES][OKAY] ...... sparse_attn[OKAY] ............ [NO] ....... [OKAY] sparse_attn sparse_attntransformer............ ............sparse_attn............[NO] [YES] [NO]............ ............. ....... [OKAY][NO][OKAY] [OKAY] ....... transformer[OKAY] transformerstochastic_transformer............ ............[YES]. transformer [YES]...... [YES] ............ [OKAY]............ [YES][OKAY][OKAY] stochastic_transformer...... .stochastic_transformer[OKAY] [YES] ....... stochastic_transformer [YES] [OKAY] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninja ninja .................................... .................................... [OKAY][OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name-------------------------------------------------- op name op name................................ op name................ installed ................installed..installed installed ..compatible .. .. compatiblecompatible-------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam..................... cpu_adam...............[OKAY] [YES] [YES]............... ...... ...... [YES]fused_adam [OKAY] [OKAY]...... ............. [OKAY][YES] ...... [OKAY] fused_lambfused_adamfused_adam fused_adam ....................................... ............. [YES] [YES][YES] [YES] .................. ...... [OKAY] [OKAY][OKAY] [OKAY] fused_lambfused_lambfused_lamb ....................................... [YES][YES] [YES] ...... ...... sparse_attn......[OKAY] [OKAY] [OKAY]............ [NO] ....... [OKAY] transformer ............ [YES] ...... sparse_attn[OKAY]sparse_attn sparse_attn .................................... [NO]stochastic_transformer[NO] [NO] .............. . .......[OKAY][YES][OKAY] [OKAY]...... [OKAY]transformer transformer transformer ............ ............ ............ [YES][YES] [YES]............ ......[OKAY][OKAY] [OKAY] stochastic_transformerstochastic_transformer stochastic_transformer. .[YES]. ......[YES][YES] [OKAY]............ [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop nameop name ................ ................................installed ................ installedinstalled .. installed....compatible ..compatiblecompatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adamcpu_adam ............... ....................................[YES] [YES] [OKAY]...... [YES] ...... [OKAY] ...... [OKAY] [OKAY]fused_adam ............. fused_adam[YES] ................... fused_adam[YES][OKAY] .............fused_adam...... [YES][OKAY] fused_lamb................... [OKAY]............. [YES] fused_lamb[YES] fused_lamb......................... .............[YES][OKAY][OKAY] [YES] ...... ......[OKAY]fused_lamb [OKAY] ............. [YES] ...... [OKAY]sparse_attn ............ [NO] .......sparse_attn [OKAY]............ [NO]sparse_attn ...................transformer [OKAY]............[NO]sparse_attn [YES].......transformer ............ ...... ............[OKAY] [NO] [OKAY] [YES] ............. stochastic_transformertransformer [OKAY][OKAY] ............ . [YES]stochastic_transformer[YES]transformer ......................... [OKAY] [YES][OKAY] [YES] ...... stochastic_transformer......[OKAY] .[OKAY] [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop nameop name ................................................................ installedinstalledinstalledinstalled ........ compatible compatible compatiblecompatible -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ...............cpu_adamcpu_adam cpu_adam [YES] ............................................. [YES]......[YES][YES] [OKAY]............ ......[OKAY][OKAY] [OKAY] fused_adamfused_adamfused_adamfused_adam .......................... ............. ............. [YES] [YES][YES][YES] ........................ [OKAY][OKAY][OKAY][OKAY] fused_lambfused_lambfused_lamb fused_lamb ............. ............. ..........................[YES][YES] [YES] ............ [YES] ......[OKAY][OKAY] ......[OKAY] [OKAY] sparse_attnsparse_attn sparse_attn........................sparse_attn ............[NO][NO] ............ [NO] .............. [NO] .......[OKAY] [OKAY] .......[OKAY] [OKAY] transformer transformertransformer............transformer ............[YES]........................ ......[YES][YES][YES] [OKAY]...... ...... ......[OKAY][OKAY] [OKAY]stochastic_transformer stochastic_transformer stochastic_transformer .stochastic_transformer.. [YES][YES] . [YES]...... ...... [OKAY][YES]......[OKAY] ......[OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ............... [NO] ....... [NO] transformer_inferenceasync_io ................. [NO][NO] .............. [OKAY][NO] utils .................. [YES] ...... [OKAY] transformer_inference .. quantizer[NO] ..................... [NO][OKAY] ....... [OKAY] utils .................. --------------------------------------------------[YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] async_io ............... [NO] ....... [NO] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']  [WARNING]  async_io: please install the libaio-devel package with yum torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO] [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] async_ioutils utils................................. ..................[NO][YES] .......[YES]...... [NO] ...... [OKAY] [OKAY] quantizer ..............quantizer [NO].............. .......transformer_inference[NO] [OKAY]......... [NO][OKAY] .......-------------------------------------------------- [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  async_io: please install the libaio-devel package with yum torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] torch cuda version ............... 11.1 quantizer .............. [NO] ....... [OKAY] nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] -------------------------------------------------- deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']1.8.2 torch cuda versiontorch version ................................... 11.11.8.2 nvcc version torch cuda version..................... ...............11.2 11.1deepspeed install path nvcc version........... .....................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 11.2 deepspeed info deepspeed install path................... ...........0.5.5+58a8e13, 58a8e13, master ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed wheel compiled w. ......deepspeed info torch 1.8, cuda 11.1................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] DeepSpeed general environment info: -------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versionDeepSpeed general environment info: .................... 1.8.2 torch cuda version ...............torch install path 11.1............... nvcc version ..................... 11.2 deepspeed install path ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']torch version deepspeed info.................... ...................1.8.2 0.5.5+58a8e13, 58a8e13, master torch cuda versiondeepspeed wheel compiled w. ..................... 11.1torch 1.8, cuda 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']1.8.2 torch cuda versiontorch version ................................... 11.11.8.2 nvcc version .....................torch cuda version 11.2............... deepspeed install path11.1 ...........nvcc version .....................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 11.2deepspeed info ...................deepspeed install path 0.5.5+58a8e13, 58a8e13, master........... deepspeed wheel compiled w.['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ......deepspeed info torch 1.8, cuda 11.1................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info deepspeed info................... ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer .............. quantizer[NO] ..................... [NO] [OKAY]....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io ............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.2150412.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > setting tensorboard ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-24 05:29:01,121] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.330 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 20.007 seconds time to initialize megatron (seconds): 33.460 [after megatron is initialized] datetime: 2021-11-24 05:29:21 building GPT model ... [2021-11-24 05:29:21,500] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-24 05:29:21,501] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-24 05:29:21,501] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.69 GB, percent = 20.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-24 05:29:22,799] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-24 05:29:23,340] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-24 05:29:23,340] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-24 05:29:23,340] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.92 GB, percent = 20.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-24 05:29:23,360] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-24 05:29:23,675] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-24 05:29:23,675] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-24 05:29:23,675] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-24 05:29:23,679] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-24 05:29:23,679] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-24 05:29:23,679] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-24 05:29:23,679] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-24 05:29:23,679] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-24 05:29:23,679] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-24 05:29:23,679] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-24 05:29:25,277] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-24 05:29:25,278] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-24 05:29:25,278] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.7 GB, percent = 21.7% [2021-11-24 05:29:25,313] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-24 05:29:25,314] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-24 05:29:25,314] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.85 GB, percent = 21.8% [2021-11-24 05:29:25,314] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-24 05:29:25,341] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-24 05:29:25,342] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-24 05:29:25,342] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.85 GB, percent = 21.8% [2021-11-24 05:29:25,342] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-24 05:29:25,342] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-24 05:29:25,342] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-24 05:29:25,342] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-24 05:29:25,342] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] amp_params ................... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] dump_state ................... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-24 05:29:25,343] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] pld_params ................... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-24 05:29:25,344] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-24 05:29:25,344] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-24 05:29:25,345] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-24 05:29:25,374] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-24 05:29:25,374] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 57 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 43 successfully loaded 32 ZeRO state_dicts for rank 42 successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 52successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 3 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 1 successfully loaded 32 ZeRO state_dicts for rank 18 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 9successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 27successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 30 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 14successfully loaded 32 ZeRO state_dicts for rank 12 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 20 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 1 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 52 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 51780 time (ms) | load-checkpoint: 11589.82 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-24 05:29:36 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 6.935014 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.133 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.205 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.068 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-24 05:29:49 done with setup ... training ... time (ms) | model-and-optimizer-setup: 15536.47 | train/valid/test-data-iterators-setup: 12106.84 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.208598528 billion Number of parameters without embeddings: 1.20860672 billion [before the start of training step] datetime: 2021-11-24 05:29:49 [2021-11-24 05:29:49,387] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-24 05:29:49,387] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-24 05:29:49,387] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-24 05:29:49,387] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-24 05:29:49,387] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 0] (after 51800 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 [Rank 32] (after 51800 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 iteration 51800/ 152972 | consumed samples: 21441984 | consumed tokens: 43913183232 | elapsed time per iteration (ms): 4792.8 | learning rate: 1.631E-04 | global batch size: 512 | lm loss: 1.390413E+00 | loss scale: 65536.0 | grad norm: 5740.618 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 05:46:55,276] [INFO] [logging.py:68:log_dist] [Rank 0] step=52000, skipped=105, lr=[0.00016280683275794736, 0.00016280683275794736], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 52000/ 152972 | consumed samples: 21544384 | consumed tokens: 44122898432 | elapsed time per iteration (ms): 4651.2 | learning rate: 1.628E-04 | global batch size: 512 | lm loss: 1.493416E+00 | loss scale: 131072.0 | grad norm: 14470.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 52000 loss: 1.9383 iter time (s): 0.002 samples/sec: 219340.690 ------------------------------------------------------------------------------------------- valid loss at iteration 52000 | lm loss value: 1.519452E+00 | lm loss PPL: 4.569721E+00 | ------------------------------------------------------------------------------------------- iteration 52200/ 152972 | consumed samples: 21646784 | consumed tokens: 44332613632 | elapsed time per iteration (ms): 5207.9 | learning rate: 1.625E-04 | global batch size: 512 | lm loss: 1.512332E+00 | loss scale: 131072.0 | grad norm: 13192.860 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 52400/ 152972 | consumed samples: 21749184 | consumed tokens: 44542328832 | elapsed time per iteration (ms): 4671.8 | learning rate: 1.621E-04 | global batch size: 512 | lm loss: 1.528106E+00 | loss scale: 262144.0 | grad norm: 21369.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 52500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 06:27:40,497] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/mp_rank_00_model_states.pt [2021-11-24 06:27:40,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,942] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,967] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,971] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,977] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,982] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,983] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 06:27:40,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,987] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,988] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 06:27:40,991] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 06:27:41,000] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 06:27:41,001] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step52500/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 52500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2739.06 iteration 52600/ 152972 | consumed samples: 21851584 | consumed tokens: 44752044032 | elapsed time per iteration (ms): 4671.6 | learning rate: 1.618E-04 | global batch size: 512 | lm loss: 1.524816E+00 | loss scale: 131072.0 | grad norm: 16635.966 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 52800/ 152972 | consumed samples: 21953984 | consumed tokens: 44961759232 | elapsed time per iteration (ms): 4656.0 | learning rate: 1.615E-04 | global batch size: 512 | lm loss: 1.503093E+00 | loss scale: 131072.0 | grad norm: 10656.982 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 53000/ 152972 | consumed samples: 22056384 | consumed tokens: 45171474432 | elapsed time per iteration (ms): 4661.4 | learning rate: 1.611E-04 | global batch size: 512 | lm loss: 1.524425E+00 | loss scale: 131072.0 | grad norm: 12492.572 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 53000 | lm loss value: 1.579874E+00 | lm loss PPL: 4.854343E+00 | ------------------------------------------------------------------------------------------- iteration 53200/ 152972 | consumed samples: 22158784 | consumed tokens: 45381189632 | elapsed time per iteration (ms): 5208.3 | learning rate: 1.608E-04 | global batch size: 512 | lm loss: 1.521222E+00 | loss scale: 131072.0 | grad norm: 20091.984 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 53400/ 152972 | consumed samples: 22261184 | consumed tokens: 45590904832 | elapsed time per iteration (ms): 4665.6 | learning rate: 1.605E-04 | global batch size: 512 | lm loss: 1.455462E+00 | loss scale: 131072.0 | grad norm: 12959.313 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 53600/ 152972 | consumed samples: 22363584 | consumed tokens: 45800620032 | elapsed time per iteration (ms): 4654.1 | learning rate: 1.601E-04 | global batch size: 512 | lm loss: 1.516826E+00 | loss scale: 131072.0 | grad norm: 13156.951 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 53800/ 152972 | consumed samples: 22465984 | consumed tokens: 46010335232 | elapsed time per iteration (ms): 4925.3 | learning rate: 1.598E-04 | global batch size: 512 | lm loss: 1.473294E+00 | loss scale: 262144.0 | grad norm: 11300.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 08:27:46,204] [INFO] [logging.py:68:log_dist] [Rank 0] step=54000, skipped=110, lr=[0.00015944089951004453, 0.00015944089951004453], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 54000 loss: 1.7589 iter time (s): 0.002 samples/sec: 219455.667 iteration 54000/ 152972 | consumed samples: 22568384 | consumed tokens: 46220050432 | elapsed time per iteration (ms): 4932.7 | learning rate: 1.594E-04 | global batch size: 512 | lm loss: 1.543969E+00 | loss scale: 262144.0 | grad norm: 29232.630 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 54000 | lm loss value: 1.491715E+00 | lm loss PPL: 4.444711E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 54000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 08:29:36,602] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/mp_rank_00_model_states.pt [2021-11-24 08:29:37,019] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,024] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,028] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,029] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,030] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,032] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,038] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,038] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,041] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,042] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,044] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,045] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,046] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,046] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,052] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,061] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,062] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,062] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,064] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,066] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,067] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,067] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,068] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,073] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,083] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,083] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,087] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,090] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 08:29:37,093] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,099] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,103] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 08:29:37,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step54000/zero_pp_rank_18_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 54000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2718.02 iteration 54200/ 152972 | consumed samples: 22670784 | consumed tokens: 46429765632 | elapsed time per iteration (ms): 6477.6 | learning rate: 1.591E-04 | global batch size: 512 | lm loss: 1.565692E+00 | loss scale: 131072.0 | grad norm: 5804.174 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 54400/ 152972 | consumed samples: 22773184 | consumed tokens: 46639480832 | elapsed time per iteration (ms): 5674.4 | learning rate: 1.588E-04 | global batch size: 512 | lm loss: 1.518806E+00 | loss scale: 131072.0 | grad norm: 14949.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 54600/ 152972 | consumed samples: 22875584 | consumed tokens: 46849196032 | elapsed time per iteration (ms): 5328.3 | learning rate: 1.584E-04 | global batch size: 512 | lm loss: 1.473781E+00 | loss scale: 262144.0 | grad norm: 21590.042 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 54800/ 152972 | consumed samples: 22977984 | consumed tokens: 47058911232 | elapsed time per iteration (ms): 5522.4 | learning rate: 1.581E-04 | global batch size: 512 | lm loss: 1.568721E+00 | loss scale: 131072.0 | grad norm: 14879.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 55000/ 152972 | consumed samples: 23080384 | consumed tokens: 47268626432 | elapsed time per iteration (ms): 5979.2 | learning rate: 1.577E-04 | global batch size: 512 | lm loss: 1.533630E+00 | loss scale: 65536.0 | grad norm: 7671.071 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 55000 | lm loss value: 1.477763E+00 | lm loss PPL: 4.383130E+00 | ------------------------------------------------------------------------------------------- iteration 55200/ 152972 | consumed samples: 23182784 | consumed tokens: 47478341632 | elapsed time per iteration (ms): 7634.0 | learning rate: 1.574E-04 | global batch size: 512 | lm loss: 1.518007E+00 | loss scale: 65536.0 | grad norm: 4743.299 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 55400/ 152972 | consumed samples: 23285184 | consumed tokens: 47688056832 | elapsed time per iteration (ms): 5930.9 | learning rate: 1.570E-04 | global batch size: 512 | lm loss: 1.516688E+00 | loss scale: 131072.0 | grad norm: 8451.889 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 55500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 10:59:16,855] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/mp_rank_00_model_states.pt [2021-11-24 10:59:17,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,283] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,321] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,333] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 10:59:17,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 10:59:17,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step55500/zero_pp_rank_29_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 55500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2743.86 iteration 55600/ 152972 | consumed samples: 23387584 | consumed tokens: 47897772032 | elapsed time per iteration (ms): 6086.0 | learning rate: 1.567E-04 | global batch size: 512 | lm loss: 1.521057E+00 | loss scale: 131072.0 | grad norm: 12874.111 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 55800/ 152972 | consumed samples: 23489984 | consumed tokens: 48107487232 | elapsed time per iteration (ms): 6407.2 | learning rate: 1.563E-04 | global batch size: 512 | lm loss: 1.541004E+00 | loss scale: 131072.0 | grad norm: 12215.967 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 11:53:57,848] [INFO] [logging.py:68:log_dist] [Rank 0] step=56000, skipped=116, lr=[0.00015597172082979662, 0.00015597172082979662], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 56000 loss: 1.6408 iter time (s): 0.006 samples/sec: 92901.383 iteration 56000/ 152972 | consumed samples: 23592384 | consumed tokens: 48317202432 | elapsed time per iteration (ms): 6818.1 | learning rate: 1.560E-04 | global batch size: 512 | lm loss: 1.498326E+00 | loss scale: 65536.0 | grad norm: 7011.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 56000 | lm loss value: 1.499878E+00 | lm loss PPL: 4.481140E+00 | ------------------------------------------------------------------------------------------- iteration 56200/ 152972 | consumed samples: 23694784 | consumed tokens: 48526917632 | elapsed time per iteration (ms): 7692.8 | learning rate: 1.556E-04 | global batch size: 512 | lm loss: 1.540735E+00 | loss scale: 32768.0 | grad norm: 10161.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 56400/ 152972 | consumed samples: 23797184 | consumed tokens: 48736632832 | elapsed time per iteration (ms): 6522.2 | learning rate: 1.553E-04 | global batch size: 512 | lm loss: 1.534103E+00 | loss scale: 32768.0 | grad norm: 3887.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 56600/ 152972 | consumed samples: 23899584 | consumed tokens: 48946348032 | elapsed time per iteration (ms): 6662.2 | learning rate: 1.549E-04 | global batch size: 512 | lm loss: 1.511724E+00 | loss scale: 32768.0 | grad norm: 4321.449 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 56800/ 152972 | consumed samples: 24001984 | consumed tokens: 49156063232 | elapsed time per iteration (ms): 6596.5 | learning rate: 1.546E-04 | global batch size: 512 | lm loss: 1.496831E+00 | loss scale: 32768.0 | grad norm: 4327.537 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 57000/ 152972 | consumed samples: 24104384 | consumed tokens: 49365778432 | elapsed time per iteration (ms): 6428.7 | learning rate: 1.542E-04 | global batch size: 512 | lm loss: 1.713227E+00 | loss scale: 8192.0 | grad norm: 924.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 57000 | lm loss value: 1.488713E+00 | lm loss PPL: 4.431388E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 57000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 13:52:25,167] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/mp_rank_00_model_states.pt [2021-11-24 13:52:25,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 13:52:25,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 13:52:25,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step57000/zero_pp_rank_22_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 57000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2594.49 iteration 57200/ 152972 | consumed samples: 24206784 | consumed tokens: 49575493632 | elapsed time per iteration (ms): 7377.4 | learning rate: 1.538E-04 | global batch size: 512 | lm loss: 1.493214E+00 | loss scale: 8192.0 | grad norm: 1112.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 57400/ 152972 | consumed samples: 24309184 | consumed tokens: 49785208832 | elapsed time per iteration (ms): 6534.7 | learning rate: 1.535E-04 | global batch size: 512 | lm loss: 1.593234E+00 | loss scale: 8192.0 | grad norm: 865.623 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 57600/ 152972 | consumed samples: 24411584 | consumed tokens: 49994924032 | elapsed time per iteration (ms): 6084.1 | learning rate: 1.531E-04 | global batch size: 512 | lm loss: 1.573242E+00 | loss scale: 16384.0 | grad norm: 1974.097 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 57800/ 152972 | consumed samples: 24513984 | consumed tokens: 50204639232 | elapsed time per iteration (ms): 6577.7 | learning rate: 1.528E-04 | global batch size: 512 | lm loss: 1.540510E+00 | loss scale: 16384.0 | grad norm: 930.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 15:36:34,264] [INFO] [logging.py:68:log_dist] [Rank 0] step=58000, skipped=121, lr=[0.0001524025093197851, 0.0001524025093197851], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 58000 loss: 1.7951 iter time (s): 0.002 samples/sec: 215571.774 iteration 58000/ 152972 | consumed samples: 24616384 | consumed tokens: 50414354432 | elapsed time per iteration (ms): 6305.9 | learning rate: 1.524E-04 | global batch size: 512 | lm loss: 1.524496E+00 | loss scale: 32768.0 | grad norm: 3602.508 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 58000 | lm loss value: 1.477466E+00 | lm loss PPL: 4.381829E+00 | ------------------------------------------------------------------------------------------- iteration 58200/ 152972 | consumed samples: 24718784 | consumed tokens: 50624069632 | elapsed time per iteration (ms): 6700.2 | learning rate: 1.520E-04 | global batch size: 512 | lm loss: 1.506342E+00 | loss scale: 32768.0 | grad norm: 1925.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 58400/ 152972 | consumed samples: 24821184 | consumed tokens: 50833784832 | elapsed time per iteration (ms): 6677.1 | learning rate: 1.517E-04 | global batch size: 512 | lm loss: 1.562879E+00 | loss scale: 32768.0 | grad norm: 3689.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 58500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 16:32:10,987] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/mp_rank_00_model_states.pt [2021-11-24 16:32:11,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,422] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 16:32:11,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 16:32:11,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step58500/zero_pp_rank_29_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 58500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2643.47 iteration 58600/ 152972 | consumed samples: 24923584 | consumed tokens: 51043500032 | elapsed time per iteration (ms): 6499.2 | learning rate: 1.513E-04 | global batch size: 512 | lm loss: 1.550794E+00 | loss scale: 65536.0 | grad norm: 8074.711 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 58800/ 152972 | consumed samples: 25025984 | consumed tokens: 51253215232 | elapsed time per iteration (ms): 5662.3 | learning rate: 1.509E-04 | global batch size: 512 | lm loss: 1.578578E+00 | loss scale: 65536.0 | grad norm: 3548.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 59000/ 152972 | consumed samples: 25128384 | consumed tokens: 51462930432 | elapsed time per iteration (ms): 5904.6 | learning rate: 1.506E-04 | global batch size: 512 | lm loss: 1.467491E+00 | loss scale: 131072.0 | grad norm: 10555.612 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 59000 | lm loss value: 1.494197E+00 | lm loss PPL: 4.455757E+00 | ------------------------------------------------------------------------------------------- iteration 59200/ 152972 | consumed samples: 25230784 | consumed tokens: 51672645632 | elapsed time per iteration (ms): 7677.1 | learning rate: 1.502E-04 | global batch size: 512 | lm loss: 1.552111E+00 | loss scale: 131072.0 | grad norm: 14764.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 59400/ 152972 | consumed samples: 25333184 | consumed tokens: 51882360832 | elapsed time per iteration (ms): 5811.7 | learning rate: 1.498E-04 | global batch size: 512 | lm loss: 1.535997E+00 | loss scale: 65536.0 | grad norm: 4951.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 59600/ 152972 | consumed samples: 25435584 | consumed tokens: 52092076032 | elapsed time per iteration (ms): 6522.4 | learning rate: 1.495E-04 | global batch size: 512 | lm loss: 1.513916E+00 | loss scale: 8192.0 | grad norm: 789.435 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 59800/ 152972 | consumed samples: 25537984 | consumed tokens: 52301791232 | elapsed time per iteration (ms): 5507.8 | learning rate: 1.491E-04 | global batch size: 512 | lm loss: 1.474538E+00 | loss scale: 8192.0 | grad norm: 802.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 19:05:14,843] [INFO] [logging.py:68:log_dist] [Rank 0] step=60000, skipped=126, lr=[0.0001487418636335107, 0.0001487418636335107], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 60000 loss: 1.8095 iter time (s): 0.004 samples/sec: 135451.330 iteration 60000/ 152972 | consumed samples: 25640384 | consumed tokens: 52511506432 | elapsed time per iteration (ms): 5640.4 | learning rate: 1.487E-04 | global batch size: 512 | lm loss: 1.497443E+00 | loss scale: 8192.0 | grad norm: 921.749 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 60000 | lm loss value: 1.470346E+00 | lm loss PPL: 4.350742E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 60000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 19:10:20,176] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/mp_rank_00_model_states.pt [2021-11-24 19:10:20,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 19:10:20,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 19:10:20,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step60000/zero_pp_rank_31_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 60000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2648.87 iteration 60200/ 152972 | consumed samples: 25742784 | consumed tokens: 52721221632 | elapsed time per iteration (ms): 7360.0 | learning rate: 1.484E-04 | global batch size: 512 | lm loss: 1.503263E+00 | loss scale: 16384.0 | grad norm: 1442.928 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 60400/ 152972 | consumed samples: 25845184 | consumed tokens: 52930936832 | elapsed time per iteration (ms): 5693.3 | learning rate: 1.480E-04 | global batch size: 512 | lm loss: 1.593676E+00 | loss scale: 16384.0 | grad norm: 1929.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 60600/ 152972 | consumed samples: 25947584 | consumed tokens: 53140652032 | elapsed time per iteration (ms): 5377.4 | learning rate: 1.476E-04 | global batch size: 512 | lm loss: 1.510522E+00 | loss scale: 32768.0 | grad norm: 4639.440 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 60800/ 152972 | consumed samples: 26049984 | consumed tokens: 53350367232 | elapsed time per iteration (ms): 5154.7 | learning rate: 1.472E-04 | global batch size: 512 | lm loss: 1.494777E+00 | loss scale: 32768.0 | grad norm: 3300.663 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 61000/ 152972 | consumed samples: 26152384 | consumed tokens: 53560082432 | elapsed time per iteration (ms): 5131.3 | learning rate: 1.469E-04 | global batch size: 512 | lm loss: 1.500096E+00 | loss scale: 32768.0 | grad norm: 3083.472 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 61000 | lm loss value: 1.506323E+00 | lm loss PPL: 4.510118E+00 | ------------------------------------------------------------------------------------------- iteration 61200/ 152972 | consumed samples: 26254784 | consumed tokens: 53769797632 | elapsed time per iteration (ms): 6214.0 | learning rate: 1.465E-04 | global batch size: 512 | lm loss: 1.517281E+00 | loss scale: 65536.0 | grad norm: 5694.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 61400/ 152972 | consumed samples: 26357184 | consumed tokens: 53979512832 | elapsed time per iteration (ms): 5046.8 | learning rate: 1.461E-04 | global batch size: 512 | lm loss: 1.484968E+00 | loss scale: 32768.0 | grad norm: 3723.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 61500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 21:26:47,759] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/mp_rank_00_model_states.pt [2021-11-24 21:26:48,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,211] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,224] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,226] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,230] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,234] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,241] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,245] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,247] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,250] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-24 21:26:48,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 21:26:48,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step61500/zero_pp_rank_12_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 61500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2660.24 iteration 61600/ 152972 | consumed samples: 26459584 | consumed tokens: 54189228032 | elapsed time per iteration (ms): 4940.4 | learning rate: 1.457E-04 | global batch size: 512 | lm loss: 1.513209E+00 | loss scale: 32768.0 | grad norm: 2539.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 61800/ 152972 | consumed samples: 26561984 | consumed tokens: 54398943232 | elapsed time per iteration (ms): 4908.1 | learning rate: 1.454E-04 | global batch size: 512 | lm loss: 1.499277E+00 | loss scale: 65536.0 | grad norm: 7339.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-24 22:07:30,108] [INFO] [logging.py:68:log_dist] [Rank 0] step=62000, skipped=128, lr=[0.0001449911532184592, 0.0001449911532184592], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 62000 loss: 1.3634 iter time (s): 0.002 samples/sec: 214576.103 iteration 62000/ 152972 | consumed samples: 26664384 | consumed tokens: 54608658432 | elapsed time per iteration (ms): 4850.4 | learning rate: 1.450E-04 | global batch size: 512 | lm loss: 1.529963E+00 | loss scale: 65536.0 | grad norm: 6457.547 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 62000 | lm loss value: 1.543600E+00 | lm loss PPL: 4.681415E+00 | ------------------------------------------------------------------------------------------- iteration 62200/ 152972 | consumed samples: 26766784 | consumed tokens: 54818373632 | elapsed time per iteration (ms): 6560.9 | learning rate: 1.446E-04 | global batch size: 512 | lm loss: 1.532090E+00 | loss scale: 65536.0 | grad norm: 8908.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 62400/ 152972 | consumed samples: 26869184 | consumed tokens: 55028088832 | elapsed time per iteration (ms): 5698.7 | learning rate: 1.442E-04 | global batch size: 512 | lm loss: 1.523673E+00 | loss scale: 131072.0 | grad norm: 12396.184 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 62600/ 152972 | consumed samples: 26971584 | consumed tokens: 55237804032 | elapsed time per iteration (ms): 5231.6 | learning rate: 1.438E-04 | global batch size: 512 | lm loss: 1.472040E+00 | loss scale: 131072.0 | grad norm: 13344.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 62800/ 152972 | consumed samples: 27073984 | consumed tokens: 55447519232 | elapsed time per iteration (ms): 5468.1 | learning rate: 1.435E-04 | global batch size: 512 | lm loss: 1.559808E+00 | loss scale: 131072.0 | grad norm: 13914.314 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 63000/ 152972 | consumed samples: 27176384 | consumed tokens: 55657234432 | elapsed time per iteration (ms): 4923.0 | learning rate: 1.431E-04 | global batch size: 512 | lm loss: 1.431124E+00 | loss scale: 131072.0 | grad norm: 11412.861 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 63000 | lm loss value: 1.449958E+00 | lm loss PPL: 4.262934E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 63000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-24 23:43:37,053] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/mp_rank_00_model_states.pt [2021-11-24 23:43:37,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,502] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,507] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,514] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,526] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,526] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-24 23:43:37,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-24 23:43:37,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step63000/zero_pp_rank_9_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 63000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2698.90 iteration 63200/ 152972 | consumed samples: 27278784 | consumed tokens: 55866949632 | elapsed time per iteration (ms): 5887.8 | learning rate: 1.427E-04 | global batch size: 512 | lm loss: 1.536530E+00 | loss scale: 262144.0 | grad norm: 12719.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 63400/ 152972 | consumed samples: 27381184 | consumed tokens: 56076664832 | elapsed time per iteration (ms): 4851.8 | learning rate: 1.423E-04 | global batch size: 512 | lm loss: 1.546012E+00 | loss scale: 65536.0 | grad norm: 6524.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 63600/ 152972 | consumed samples: 27483584 | consumed tokens: 56286380032 | elapsed time per iteration (ms): 4787.5 | learning rate: 1.419E-04 | global batch size: 512 | lm loss: 1.536172E+00 | loss scale: 8192.0 | grad norm: 1019.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 63800/ 152972 | consumed samples: 27585984 | consumed tokens: 56496095232 | elapsed time per iteration (ms): 4810.5 | learning rate: 1.416E-04 | global batch size: 512 | lm loss: 1.520679E+00 | loss scale: 8192.0 | grad norm: 898.550 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 01:04:12,720] [INFO] [logging.py:68:log_dist] [Rank 0] step=64000, skipped=135, lr=[0.00014117274240234352, 0.00014117274240234352], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 64000 loss: 1.0482 iter time (s): 0.002 samples/sec: 215493.098 iteration 64000/ 152972 | consumed samples: 27688384 | consumed tokens: 56705810432 | elapsed time per iteration (ms): 4793.1 | learning rate: 1.412E-04 | global batch size: 512 | lm loss: 1.530510E+00 | loss scale: 8192.0 | grad norm: 678.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 64000 | lm loss value: 1.539459E+00 | lm loss PPL: 4.662067E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 64156 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 01:18:50,992] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/mp_rank_00_model_states.pt [2021-11-25 01:18:51,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 01:18:51,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 01:18:51,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64156/zero_pp_rank_17_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 64156 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2717.59 [exiting program after 1190.0175973455111 minutes] datetime: 2021-11-25 01:18:51 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninja .................. ..................[OKAY] [OKAY]-------------------------------------------------- --------------------------------------------------op name ................ op nameinstalled .................. compatibleinstalled -------------------------------------------------- .. compatible -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam [OKAY]............... [YES] ...... [OKAY] fused_adam ............. fused_adam[YES] ................... [OKAY][YES] ...... fused_lamb[OKAY] ............. [YES] ......fused_lamb [OKAY]............. [YES] ...... [OKAY] sparse_attn ............ [NO] .......sparse_attn [OKAY] ............ [NO] transformer....... ............ [OKAY][YES] ...... [OKAY] transformer ............ [YES]stochastic_transformer ...... .[OKAY] [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op nameop name ................................................ ................ installedinstalledinstalled installed ........ compatiblecompatiblecompatiblecompatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam ............... .............................. ...............[YES][YES] [YES][YES]...... ...... ...... [OKAY]......[OKAY] [OKAY][OKAY] fused_adam fused_adam............. fused_adam fused_adam[YES] ....................................... ...... [YES][YES][OKAY][YES] .................. [OKAY]fused_lamb[OKAY] [OKAY] ............. fused_lambfused_lamb[YES] fused_lamb ................................ .............[YES][YES][OKAY] [YES]............ ......[OKAY][OKAY] [OKAY] sparse_attn ............ [NO] sparse_attnsparse_attnsparse_attn ....... .................................... [OKAY][NO][NO][NO] ..................... transformer [OKAY] [OKAY] [OKAY]............ [YES]transformer transformer......transformer ........................ [OKAY] ............ [YES][YES] [YES] ...... ......stochastic_transformer ...... [OKAY] [OKAY]. [OKAY][YES] stochastic_transformer......stochastic_transformer .[OKAY]stochastic_transformer. [YES].[YES] [YES]............ ......[OKAY][OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY]..................[OKAY][OKAY] [OKAY] -------------------------------------------------- ----------------------------------------------------------------------------------------------------op name -------------------------------------------------- op name................op nameop name installed................................................ ..installedinstalledinstalled compatible.. .. ..-------------------------------------------------- compatible compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ cpu_adam ............... [YES]cpu_adamcpu_adam cpu_adam ..................... ............... ............... [OKAY] [YES][YES] [YES] .................. [OKAY] [OKAY] [OKAY] fused_adam ............. [YES] ...... fused_adam[OKAY]fused_adam fused_adam ............. ............. .............fused_lamb [YES][YES] [YES] ............. ...... ...... [YES]......[OKAY] [OKAY][OKAY]...... fused_lamb[OKAY] fused_lamb fused_lamb............. ..........................[YES] [YES]......[YES] ......[OKAY]...... [OKAY]sparse_attn[OKAY] ............ [NO] ....... [OKAY] transformer sparse_attn............ ............[YES] sparse_attnsparse_attn [NO] ..................................... [OKAY][NO][OKAY] [NO] .......transformer .......[OKAY] ............stochastic_transformer[OKAY] transformer[YES]. transformer...... ............ [YES] ............[YES][OKAY] ...... [YES]...... [OKAY]......[OKAY] stochastic_transformer [OKAY] . [YES]stochastic_transformer stochastic_transformer....... .[YES] [OKAY] [YES] ...... ......[OKAY] [OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name op name................ ................ ................ ................installed installed installed..installed ......compatible compatiblecompatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam[YES] ............... ............... ...... ...............[YES][YES] [OKAY][YES]............ ...... [OKAY][OKAY][OKAY] fused_adam ............. [YES] fused_adam...... fused_adam fused_adam............. [OKAY] .......................... [YES][YES] [YES]............fused_lamb ......[OKAY]............. [OKAY][YES][OKAY] fused_lamb ...... .............[OKAY]fused_lamb [YES]fused_lamb............. ......[YES]............. [OKAY][YES]...... ......[OKAY] sparse_attn[OKAY] ............ [NO] ....... [OKAY] sparse_attn ............ [NO]transformer .......sparse_attn............ sparse_attn ............[OKAY] [YES] ............ [NO] ......transformer [NO]....... .......[OKAY]............ [OKAY] [OKAY] [YES] ......stochastic_transformer transformer transformer [OKAY]. ............ ............ [YES] [YES] [YES]stochastic_transformer ...... ...... ....... [OKAY] [OKAY] [OKAY] [YES] ...... stochastic_transformer[OKAY]stochastic_transformer .. [YES][YES] ............ [OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op nameop name................................ ................ installed................ installed ..installed installed.. ..compatible..compatible compatible---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam cpu_adamcpu_adam .............................. ............... [YES]............... [YES]...... [YES] [YES]...... [OKAY]............ [OKAY] [OKAY][OKAY] fused_adam ............. fused_adam[YES] ...................fused_adam fused_adam .............[OKAY] [YES] .............[YES] ......fused_lamb[YES]...... .............[OKAY][OKAY] ...... [YES] [OKAY]......fused_lamb fused_lamb [OKAY]fused_lamb............. ............. [YES].............[YES] [YES]............ ......[OKAY][OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES]sparse_attnsparse_attn ...... sparse_attn............ [OKAY]............ [NO] ............ [NO] .......stochastic_transformer[NO] ....... [OKAY]........ [OKAY] [OKAY][YES] transformer transformer......transformer ............ ............[OKAY]............ [YES] [YES] [YES]............ ...... [OKAY][OKAY][OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [YES][YES][YES] .................. [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninja ninja...................................................... [OKAY] ..................[OKAY] [OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- -------------------------------------------------- op name --------------------------------------------------op name................op name ................installedop name................ installed .................. installed.. compatibleinstalledcompatible .. ..-------------------------------------------------- -------------------------------------------------- compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......[YES] [OKAY]cpu_adam...... cpu_adam ...............[OKAY]............... [YES][YES]fused_adam ......................... [YES]fused_adam[OKAY] [OKAY] ...... ............. [OKAY][YES] ......fused_lamb fused_adam[OKAY] .............fused_adam ............. [YES] ............. [YES]fused_lamb...... ...... ............. [OKAY][OKAY][YES] [YES] ............fused_lamb [OKAY][OKAY] ............. [YES] fused_lambsparse_attn...... ............[OKAY]............. [NO] [YES]....... sparse_attn ...... [OKAY] ............ [OKAY][NO]transformer sparse_attn....... ............ ............ [OKAY] [YES] [NO] ......transformer....... [OKAY]............[OKAY] [YES] sparse_attn......transformerstochastic_transformer ............[OKAY]............. [YES][NO][YES] ...... ...... stochastic_transformer.......[OKAY] .[OKAY] [OKAY] [YES] ......transformer stochastic_transformer[OKAY] ............. [YES][YES] ............ [OKAY] [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name ................op name................ op name installed installed.................................. installed..installedcompatible compatible.... -------------------------------------------------- --------------------------------------------------compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES]cpu_adam[YES] cpu_adam ............ ...............[OKAY] ............... [OKAY][YES] ...... [YES][OKAY] ...... fused_adam[OKAY]fused_adam .......................... [YES][YES] fused_adam............ fused_adam[OKAY][OKAY]............. .............[YES]fused_lamb fused_lamb......[YES]............. [YES].............[OKAY] ...... ...... [YES] [OKAY] [OKAY]fused_lamb ................... [OKAY][YES]fused_lamb ................... [OKAY][YES] ...... [OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] sparse_attn[NO] transformersparse_attn ....... ............ ............ [OKAY]............ [NO] [NO] [YES] transformer....... ....... .................. [OKAY] [OKAY] [OKAY] [YES] ......transformertransformer stochastic_transformer [OKAY] ............ ............. [YES][YES][YES]stochastic_transformer ...... ............ . [OKAY][OKAY][OKAY] [YES] ...... stochastic_transformer[OKAY]stochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... [OKAY] ..................[OKAY][OKAY] -------------------------------------------------- [OKAY]---------------------------------------------------------------------------------------------------- op nameop nameop name -------------------------------------------------- ................................ ................ installedop name installed ..installed ................ .. compatible.. installed compatible --------------------------------------------------compatible--------------------------------------------------.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ..............................cpu_adam cpu_adam[YES][YES] ............... ............... ............ [YES] [YES][OKAY][OKAY]...... ...... [OKAY] [OKAY] fused_adamfused_adam .............fused_adam............. fused_adam[YES] .............[YES] ......[YES] ................... [OKAY]......[OKAY][YES] [OKAY]...... fused_lambfused_lamb[OKAY] .............fused_lamb ............. [YES].............[YES] fused_lamb ............ [YES][OKAY]............. [OKAY] ...... [YES] [OKAY]...... [OKAY] sparse_attnsparse_attn ........................ [NO]sparse_attn sparse_attn[NO]....... ............................... [OKAY] [NO] [NO][OKAY]....... transformer ....... [OKAY] ............transformer [OKAY] [YES] ............ transformer ...... [YES] ............[OKAY] transformer ...... [YES] ............ [OKAY] ...... stochastic_transformer[YES] .[OKAY]stochastic_transformer ...... [YES] .......[OKAY] [OKAY]stochastic_transformer[YES] ....... stochastic_transformer [OKAY] [YES]. ......[YES] [OKAY]...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op name op nameop name ................ ................ ................ ................installed installed installedinstalled.... ..compatiblecompatible ..-------------------------------------------------- compatible-------------------------------------------------- compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam.............................. ...............cpu_adam[YES] [YES] [YES]........................... [YES][OKAY]......[OKAY] ......[OKAY] [OKAY] fused_adam fused_adam............. .............[YES]fused_adam [YES]fused_adam................... ...................[YES] [OKAY][YES]......[OKAY] ......[OKAY] [OKAY]fused_lambfused_lamb fused_lamb.......................... fused_lamb............. [YES] [YES][YES]............. ...... ...... [YES]...... [OKAY] ...... [OKAY][OKAY] [OKAY] sparse_attnsparse_attnsparse_attn sparse_attn.................................... ............[NO][NO][NO] [NO] ....... ..................... [OKAY] [OKAY][OKAY] [OKAY] transformertransformertransformer transformer ............ ............ ........................ [YES] [YES] [YES] ...... [YES]............[OKAY] ......[OKAY][OKAY] [OKAY] stochastic_transformer .stochastic_transformerstochastic_transformer stochastic_transformer [YES] . ........ [YES][YES][YES] [OKAY] ...... ............ [OKAY][OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- ................op name................ op name installedinstalled................................ .. .. installedinstalled compatible compatible .. .. ---------------------------------------------------------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] .....................cpu_adamcpu_adam [YES] ...............[OKAY] ............... ...... [YES] [YES] [OKAY] ...... ...... [OKAY] fused_adam[OKAY] ............. [YES]fused_adam ................... [OKAY]fused_adamfused_adam[YES] ..........................fused_lamb ...... [YES] [YES]............. [OKAY] [YES]............ ......[OKAY] [OKAY]fused_lamb[OKAY] ............. fused_lamb[YES]fused_lamb ............. ................... [YES][YES][OKAY] sparse_attn ............ ............[OKAY][OKAY] [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] [YES]....... ......[OKAY]sparse_attnsparse_attn [OKAY] ............transformer............ ............[NO]stochastic_transformer[NO] [YES]....... . .............[OKAY][YES] [OKAY] [OKAY]...... transformer[OKAY] transformerstochastic_transformer............ .............[YES] [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] stochastic_transformer .stochastic_transformer [YES]. ......[YES] [OKAY]...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name ................................ op name ................installedinstalled ................installed.... installed .. compatiblecompatible ..----------------------------------------------------------------------------------------------------compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. cpu_adam[YES]cpu_adam[YES] ............... ...... ..................... [YES] [OKAY][YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adamfused_adam .......................... [YES][YES]fused_adam fused_adam ............ .............[OKAY][OKAY] ............. [YES] [YES]fused_lamb...... fused_lamb ............. ...... [OKAY].............[YES] [OKAY][YES]...... fused_lamb......[OKAY]fused_lamb ............. [OKAY] ............. [YES] [YES]...... ......[OKAY] [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... sparse_attnsparse_attn[OKAY] ........................transformertransformer [NO]............ [NO]............ .......[YES][YES]....... [OKAY]......[OKAY] ......[OKAY] transformertransformer[OKAY] ............stochastic_transformer............ [YES].stochastic_transformer .......[YES][YES] [OKAY][YES]............ [OKAY]......[OKAY] stochastic_transformer[OKAY] . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name op name op nameop name................ ................................................installed installed..installed installed.. .. compatible ..compatible compatible ---------------------------------------------------------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adamcpu_adam ...............cpu_adam............... cpu_adam [YES][YES] ............... ............... ......[YES]...... [OKAY][YES]...... [OKAY] ......[OKAY] [OKAY] fused_adamfused_adam fused_adam .............fused_adam ............. .............[YES].............[YES] [YES] ............ [YES] [OKAY]......[OKAY] ......[OKAY] fused_lambfused_lamb[OKAY] fused_lamb.......................... fused_lamb .............[YES] [YES]...... ............. [YES] [OKAY] ......[YES] ...... ...... [OKAY] [OKAY] [OKAY] sparse_attn ............ [NO] ....... sparse_attnsparse_attn[OKAY]sparse_attn .................................... transformer [NO][NO] ............[NO] ....... [OKAY][YES].............. ......[OKAY][OKAY] transformer[OKAY] ............transformer transformer stochastic_transformer[YES] ............ ............ ....... [YES] [YES]......[OKAY] [YES][OKAY] ...... ...... stochastic_transformer[OKAY][OKAY]stochastic_transformer .. stochastic_transformer[YES][YES] ....... ......[YES][OKAY] [OKAY]...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- op name................................op name installed................................installed .. ..installed compatibleinstalledcompatible .. ..-------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES]cpu_adam cpu_adam.....................[YES] ............... [OKAY] [YES][YES]...... ......[OKAY]...... [OKAY] [OKAY]fused_adam ............. [YES] ......fused_adam fused_adam[OKAY]............. fused_adam[YES]............. fused_lamb.............[YES]...... ...................[OKAY][YES] [YES] [OKAY]............ fused_lamb [OKAY][OKAY]............. fused_lamb [YES]............. ...... fused_lamb [YES][OKAY] ................... sparse_attn[YES][OKAY] .................. [NO] [OKAY]....... sparse_attn[OKAY] ............ [NO]transformersparse_attn ....... ............ ............[OKAY] [YES][NO] sparse_attntransformer...... ...............................[OKAY] [YES] [OKAY][NO]...... stochastic_transformer ....... [OKAY]transformer . [OKAY] stochastic_transformer............[YES] .transformer [YES] ...... ............[YES] ...... [OKAY][YES] ...... [OKAY] ......[OKAY] [OKAY] stochastic_transformer . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY] [OKAY] [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name op name................................ ................ ................installedinstalled .. installed ..installedcompatible .. compatible ..compatible-------------------------------------------------- --------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam...............[YES]cpu_adam [YES]............... ..................... ...... [OKAY][YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] fused_adamfused_adam ..........................fused_adam fused_adam[YES] [YES] ................... ............. ......[YES] [OKAY][OKAY] [YES] ............ fused_lamb[OKAY][OKAY]fused_lamb ..........................fused_lamb [YES] [YES] fused_lamb............. ......................... [OKAY][OKAY][YES][YES] ...... ......[OKAY] [OKAY] sparse_attn ............ sparse_attn[NO] sparse_attn sparse_attn................... [NO]............ ...................[OKAY] [NO]transformer[OKAY][NO] .......................... transformer [YES][OKAY][OKAY] ..................transformer transformer [OKAY] [YES]........................ ......[YES][YES] stochastic_transformer [OKAY]...... ....... [OKAY][YES][OKAY] stochastic_transformer...... .[OKAY] [YES]stochastic_transformerstochastic_transformer ....... . [OKAY] [YES] [YES] ............ [OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name ................op name ................ ................ installedinstalled................ installed ....installed ..compatiblecompatible .. compatible -------------------------------------------------- compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adamcpu_adam............... cpu_adam...............[YES]............... ...............[YES][YES]...... ......[YES][OKAY]...... [OKAY]......[OKAY] [OKAY] fused_adamfused_adamfused_adam fused_adam ............. ............. ............. .............[YES] [YES] [YES] [YES] ............ ............[OKAY] [OKAY][OKAY][OKAY] fused_lamb .............fused_lambfused_lamb fused_lamb [YES]............. ............. ...... .............[YES] [YES] [OKAY] [YES]...... ...... ...... [OKAY] [OKAY] [OKAY] sparse_attn ............sparse_attn sparse_attnsparse_attn[NO] ............ ............................... [NO] [OKAY][NO] [NO] ....... ..............[OKAY] transformer [OKAY] [OKAY] ............ transformer [YES]............ transformer transformer...... [YES] ............[OKAY].................. [YES][YES][OKAY] ......stochastic_transformer...... [OKAY].stochastic_transformer [OKAY] [YES].stochastic_transformer ......stochastic_transformer.[YES] [OKAY].......[YES] [YES][OKAY]...... ......[OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name op name................................ installed................installed ................ .. .. installedinstalled compatible compatible .... -------------------------------------------------- -------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... cpu_adam ...............cpu_adam[YES] ...............[YES]..................... [YES][YES]......[OKAY] ............[OKAY] [OKAY][OKAY] fused_adam .............fused_adam [YES] fused_adam fused_adam .......................... ...... ............. [YES][YES] [OKAY] [YES]............ ......[OKAY][OKAY] fused_lamb [OKAY]............. fused_lambfused_lamb [YES]fused_lamb.......................... ......[YES]............. [YES] [OKAY]............[YES] [OKAY]......[OKAY] [OKAY] sparse_attn ............sparse_attnsparse_attn sparse_attn[NO]............ ............ ............[NO] .......[NO][NO] .......[OKAY] ....... ....... [OKAY] [OKAY] [OKAY] transformer ............transformertransformer transformer [YES] ............ ........................ ...... [YES][YES][YES] ......[OKAY]............ [OKAY][OKAY][OKAY] stochastic_transformer stochastic_transformer.stochastic_transformerstochastic_transformer . . [YES].[YES] [YES]......[YES] ...... ...... ......[OKAY][OKAY] [OKAY] [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] ....... .......[NO] [NO] transformer_inference ..transformer_inference [NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference ..  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] async_io quantizer............... ..............[NO] [NO]....... .......[NO] [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] DeepSpeed general environment info: utils utils.................. ..................[YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] DeepSpeed general environment info:torch version .................... 1.8.2 torch install pathtorch cuda version .............................. 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']11.2 deepspeed install path ...........torch version ....................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 1.8.2 deepspeed info ...................torch cuda version 0.5.5+58a8e13, 58a8e13, master............... deepspeed wheel compiled w.11.1 ......nvcc version torch 1.8, cuda 11.1..................... 11.2  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1  [WARNING]  async_io: please install the libaio-devel package with yum nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ............... [NO] ....... [NO] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY] quantizer .............. [NO] .......utils [OKAY].................. [YES] ...... --------------------------------------------------[OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO]utils ......................... [NO][YES] ...... [OKAY] quantizer .............. [NO] ....... transformer_inference[OKAY] .. [NO] .......-------------------------------------------------- [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES] [YES]...... ......[OKAY] [OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ....... [NO] transformer_inference .. async_io[NO] ...................... [OKAY][NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference quantizer.. ..............[NO] [NO]....... .......[OKAY] [OKAY] utils-------------------------------------------------- .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.2 1.8.2 torch cuda version torch cuda version............... ...............11.1 11.1 nvcc version nvcc version..................... .....................11.2 11.2deepspeed install path deepspeed install path........... ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info deepspeed info................... ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. deepspeed wheel compiled w....... torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO]async_io ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: DeepSpeed general environment info:torch install path DeepSpeed general environment info:............... torch install path ...............torch install path ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']............... torch version ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'].................... 1.8.2['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version torch cuda version.................... torch version...............1.8.2 ....................11.1 torch cuda version 1.8.2 nvcc version............... .....................torch cuda version11.1 11.2...............nvcc version deepspeed install path.....................11.1 ...........11.2nvcc version deepspeed install path.....................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ...........11.2deepspeed info deepspeed install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']................... ...........0.5.5+58a8e13, 58a8e13, masterdeepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']...................deepspeed wheel compiled w. deepspeed info......0.5.5+58a8e13, 58a8e13, master ...................torch 1.8, cuda 11.1deepspeed wheel compiled w. 0.5.5+58a8e13, 58a8e13, master...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.2190200.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... > setting tensorboard ... > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-25 01:20:10,295] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.371 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.633 seconds time to initialize megatron (seconds): 82.302 [after megatron is initialized] datetime: 2021-11-25 01:20:34 building GPT model ... [2021-11-25 01:20:34,351] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-25 01:20:34,352] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-25 01:20:34,352] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.79 GB, percent = 20.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-25 01:20:35,649] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-25 01:20:36,185] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-25 01:20:36,186] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-25 01:20:36,186] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.01 GB, percent = 20.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-25 01:20:36,206] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-25 01:20:36,527] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-25 01:20:36,527] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-25 01:20:36,527] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-25 01:20:36,531] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-25 01:20:36,532] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-25 01:20:36,532] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-25 01:20:36,532] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-25 01:20:36,532] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-25 01:20:36,532] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-25 01:20:36,532] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-25 01:20:38,106] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-25 01:20:38,106] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-25 01:20:38,107] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.79 GB, percent = 21.8% [2021-11-25 01:20:38,139] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-25 01:20:38,140] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-25 01:20:38,140] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.92 GB, percent = 21.9% [2021-11-25 01:20:38,140] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-25 01:20:38,167] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-25 01:20:38,168] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-25 01:20:38,168] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.94 GB, percent = 21.9% [2021-11-25 01:20:38,168] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-25 01:20:38,168] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-25 01:20:38,168] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-25 01:20:38,168] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-25 01:20:38,168] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-25 01:20:38,168] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-25 01:20:38,168] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] amp_params ................... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] dump_state ................... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-25 01:20:38,169] [INFO] [config.py:944:print] pld_params ................... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-25 01:20:38,170] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-25 01:20:38,170] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-25 01:20:38,170] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-25 01:20:38,199] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-25 01:20:38,199] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 57successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 43successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 42 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 37successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 52 successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 3 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 1 successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 18successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 14successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 20successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 21successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 30 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 29 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 1 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 20 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 53 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 64156 time (ms) | load-checkpoint: 11480.35 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-25 01:20:49 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 3.288528 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.225 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.328 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.086 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-25 01:20:59 done with setup ... training ... time (ms) | model-and-optimizer-setup: 15410.83 | train/valid/test-data-iterators-setup: 9697.05 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.208598528 billion Number of parameters without embeddings: 1.20860672 billion [before the start of training step] datetime: 2021-11-25 01:20:59 [2021-11-25 01:20:59,668] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-25 01:20:59,668] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-25 01:20:59,668] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-25 01:20:59,668] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-25 01:20:59,668] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 32] (after 64200 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 [Rank 0] (after 64200 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 iteration 64200/ 152972 | consumed samples: 27790784 | consumed tokens: 56915525632 | elapsed time per iteration (ms): 4820.4 | learning rate: 1.408E-04 | global batch size: 512 | lm loss: 1.388791E+00 | loss scale: 16384.0 | grad norm: 898.001 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 64400/ 152972 | consumed samples: 27893184 | consumed tokens: 57125240832 | elapsed time per iteration (ms): 4759.5 | learning rate: 1.404E-04 | global batch size: 512 | lm loss: 1.454395E+00 | loss scale: 16384.0 | grad norm: 1633.863 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 64500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 01:48:22,395] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/mp_rank_00_model_states.pt [2021-11-25 01:48:22,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,829] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,865] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,882] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,902] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 01:48:22,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,907] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 01:48:22,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step64500/zero_pp_rank_8_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 64500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2661.15 iteration 64600/ 152972 | consumed samples: 27995584 | consumed tokens: 57334956032 | elapsed time per iteration (ms): 4811.6 | learning rate: 1.400E-04 | global batch size: 512 | lm loss: 1.537228E+00 | loss scale: 32768.0 | grad norm: 3510.613 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 64800/ 152972 | consumed samples: 28097984 | consumed tokens: 57544671232 | elapsed time per iteration (ms): 4778.7 | learning rate: 1.396E-04 | global batch size: 512 | lm loss: 1.520069E+00 | loss scale: 32768.0 | grad norm: 3029.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 65000/ 152972 | consumed samples: 28200384 | consumed tokens: 57754386432 | elapsed time per iteration (ms): 4768.8 | learning rate: 1.392E-04 | global batch size: 512 | lm loss: 1.510578E+00 | loss scale: 32768.0 | grad norm: 3719.277 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 65000 | lm loss value: 1.483656E+00 | lm loss PPL: 4.409036E+00 | ------------------------------------------------------------------------------------------- iteration 65200/ 152972 | consumed samples: 28302784 | consumed tokens: 57964101632 | elapsed time per iteration (ms): 5356.5 | learning rate: 1.388E-04 | global batch size: 512 | lm loss: 1.485796E+00 | loss scale: 65536.0 | grad norm: 5827.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 65400/ 152972 | consumed samples: 28405184 | consumed tokens: 58173816832 | elapsed time per iteration (ms): 4775.6 | learning rate: 1.384E-04 | global batch size: 512 | lm loss: 1.501852E+00 | loss scale: 65536.0 | grad norm: 8393.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 65600/ 152972 | consumed samples: 28507584 | consumed tokens: 58383532032 | elapsed time per iteration (ms): 4775.8 | learning rate: 1.381E-04 | global batch size: 512 | lm loss: 1.425379E+00 | loss scale: 65536.0 | grad norm: 3384.020 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 65800/ 152972 | consumed samples: 28609984 | consumed tokens: 58593247232 | elapsed time per iteration (ms): 4776.5 | learning rate: 1.377E-04 | global batch size: 512 | lm loss: 1.480716E+00 | loss scale: 65536.0 | grad norm: 6824.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 03:49:46,799] [INFO] [logging.py:68:log_dist] [Rank 0] step=66000, skipped=136, lr=[0.00013727289546712045, 0.00013727289546712045], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 66000 loss: 2.2396 iter time (s): 0.002 samples/sec: 214015.213 iteration 66000/ 152972 | consumed samples: 28712384 | consumed tokens: 58802962432 | elapsed time per iteration (ms): 4773.4 | learning rate: 1.373E-04 | global batch size: 512 | lm loss: 1.471481E+00 | loss scale: 131072.0 | grad norm: 18146.310 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 66000 | lm loss value: 1.485389E+00 | lm loss PPL: 4.416683E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 03:51:38,406] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/mp_rank_00_model_states.pt [2021-11-25 03:51:38,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,842] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 03:51:38,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 03:51:38,920] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step66000/zero_pp_rank_31_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 66000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2793.67 iteration 66200/ 152972 | consumed samples: 28814784 | consumed tokens: 59012677632 | elapsed time per iteration (ms): 5374.9 | learning rate: 1.369E-04 | global batch size: 512 | lm loss: 1.449137E+00 | loss scale: 131072.0 | grad norm: 12356.416 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 66400/ 152972 | consumed samples: 28917184 | consumed tokens: 59222392832 | elapsed time per iteration (ms): 4768.4 | learning rate: 1.365E-04 | global batch size: 512 | lm loss: 1.571567E+00 | loss scale: 131072.0 | grad norm: 14235.403 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 66600/ 152972 | consumed samples: 29019584 | consumed tokens: 59432108032 | elapsed time per iteration (ms): 4768.9 | learning rate: 1.361E-04 | global batch size: 512 | lm loss: 1.514781E+00 | loss scale: 65536.0 | grad norm: 7455.925 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 66800/ 152972 | consumed samples: 29121984 | consumed tokens: 59641823232 | elapsed time per iteration (ms): 4780.0 | learning rate: 1.357E-04 | global batch size: 512 | lm loss: 1.437012E+00 | loss scale: 65536.0 | grad norm: 5164.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 67000/ 152972 | consumed samples: 29224384 | consumed tokens: 59851538432 | elapsed time per iteration (ms): 4775.5 | learning rate: 1.353E-04 | global batch size: 512 | lm loss: 1.536021E+00 | loss scale: 65536.0 | grad norm: 7719.546 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 67000 | lm loss value: 1.484674E+00 | lm loss PPL: 4.413527E+00 | ------------------------------------------------------------------------------------------- iteration 67200/ 152972 | consumed samples: 29326784 | consumed tokens: 60061253632 | elapsed time per iteration (ms): 5315.8 | learning rate: 1.349E-04 | global batch size: 512 | lm loss: 1.516907E+00 | loss scale: 131072.0 | grad norm: 14442.517 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 67400/ 152972 | consumed samples: 29429184 | consumed tokens: 60270968832 | elapsed time per iteration (ms): 4770.2 | learning rate: 1.345E-04 | global batch size: 512 | lm loss: 1.496376E+00 | loss scale: 65536.0 | grad norm: 7633.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 67500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 05:52:57,867] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/mp_rank_00_model_states.pt [2021-11-25 05:52:58,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,321] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,341] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,341] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 05:52:58,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 05:52:58,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step67500/zero_pp_rank_21_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 67500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2637.98 iteration 67600/ 152972 | consumed samples: 29531584 | consumed tokens: 60480684032 | elapsed time per iteration (ms): 4783.9 | learning rate: 1.341E-04 | global batch size: 512 | lm loss: 1.567088E+00 | loss scale: 65536.0 | grad norm: 5171.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 67800/ 152972 | consumed samples: 29633984 | consumed tokens: 60690399232 | elapsed time per iteration (ms): 4776.1 | learning rate: 1.337E-04 | global batch size: 512 | lm loss: 1.568994E+00 | loss scale: 32768.0 | grad norm: 3626.632 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 06:32:43,455] [INFO] [logging.py:68:log_dist] [Rank 0] step=68000, skipped=141, lr=[0.00013331853384533724, 0.00013331853384533724], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 68000 loss: 1.1708 iter time (s): 0.002 samples/sec: 215268.464 iteration 68000/ 152972 | consumed samples: 29736384 | consumed tokens: 60900114432 | elapsed time per iteration (ms): 4769.6 | learning rate: 1.333E-04 | global batch size: 512 | lm loss: 1.502585E+00 | loss scale: 32768.0 | grad norm: 2815.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 68000 | lm loss value: 1.456739E+00 | lm loss PPL: 4.291939E+00 | ------------------------------------------------------------------------------------------- iteration 68200/ 152972 | consumed samples: 29838784 | consumed tokens: 61109829632 | elapsed time per iteration (ms): 5321.6 | learning rate: 1.329E-04 | global batch size: 512 | lm loss: 1.473392E+00 | loss scale: 65536.0 | grad norm: 5979.464 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 68400/ 152972 | consumed samples: 29941184 | consumed tokens: 61319544832 | elapsed time per iteration (ms): 4779.2 | learning rate: 1.325E-04 | global batch size: 512 | lm loss: 1.491969E+00 | loss scale: 32768.0 | grad norm: 3729.261 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 68600/ 152972 | consumed samples: 30043584 | consumed tokens: 61529260032 | elapsed time per iteration (ms): 4768.6 | learning rate: 1.321E-04 | global batch size: 512 | lm loss: 1.494867E+00 | loss scale: 32768.0 | grad norm: 3068.226 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 68800/ 152972 | consumed samples: 30145984 | consumed tokens: 61738975232 | elapsed time per iteration (ms): 4776.9 | learning rate: 1.317E-04 | global batch size: 512 | lm loss: 1.503663E+00 | loss scale: 32768.0 | grad norm: 4386.098 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 69000/ 152972 | consumed samples: 30248384 | consumed tokens: 61948690432 | elapsed time per iteration (ms): 4770.2 | learning rate: 1.313E-04 | global batch size: 512 | lm loss: 1.478511E+00 | loss scale: 65536.0 | grad norm: 7011.059 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 69000 | lm loss value: 1.473478E+00 | lm loss PPL: 4.364389E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 69000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 07:55:57,906] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/mp_rank_00_model_states.pt [2021-11-25 07:55:58,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,341] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,351] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,352] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,376] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,382] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 07:55:58,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 07:55:58,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step69000/zero_pp_rank_1_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 69000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2610.89 iteration 69200/ 152972 | consumed samples: 30350784 | consumed tokens: 62158405632 | elapsed time per iteration (ms): 5345.9 | learning rate: 1.309E-04 | global batch size: 512 | lm loss: 1.517792E+00 | loss scale: 65536.0 | grad norm: 9053.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 69400/ 152972 | consumed samples: 30453184 | consumed tokens: 62368120832 | elapsed time per iteration (ms): 4783.1 | learning rate: 1.305E-04 | global batch size: 512 | lm loss: 1.466684E+00 | loss scale: 131072.0 | grad norm: 12168.913 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 69600/ 152972 | consumed samples: 30555584 | consumed tokens: 62577836032 | elapsed time per iteration (ms): 4779.3 | learning rate: 1.301E-04 | global batch size: 512 | lm loss: 1.534307E+00 | loss scale: 131072.0 | grad norm: 19157.401 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 69800/ 152972 | consumed samples: 30657984 | consumed tokens: 62787551232 | elapsed time per iteration (ms): 4776.3 | learning rate: 1.297E-04 | global batch size: 512 | lm loss: 1.519920E+00 | loss scale: 131072.0 | grad norm: 16528.326 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 09:15:38,473] [INFO] [logging.py:68:log_dist] [Rank 0] step=70000, skipped=144, lr=[0.00012930550505649136, 0.00012930550505649136], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 70000 loss: 0.9974 iter time (s): 0.002 samples/sec: 214824.262 iteration 70000/ 152972 | consumed samples: 30760384 | consumed tokens: 62997266432 | elapsed time per iteration (ms): 4774.0 | learning rate: 1.293E-04 | global batch size: 512 | lm loss: 1.463889E+00 | loss scale: 262144.0 | grad norm: 22060.245 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 70000 | lm loss value: 1.487219E+00 | lm loss PPL: 4.424772E+00 | ------------------------------------------------------------------------------------------- iteration 70200/ 152972 | consumed samples: 30862784 | consumed tokens: 63206981632 | elapsed time per iteration (ms): 5262.4 | learning rate: 1.289E-04 | global batch size: 512 | lm loss: 1.465612E+00 | loss scale: 65536.0 | grad norm: 7052.065 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 70400/ 152972 | consumed samples: 30965184 | consumed tokens: 63416696832 | elapsed time per iteration (ms): 4637.1 | learning rate: 1.285E-04 | global batch size: 512 | lm loss: 1.493842E+00 | loss scale: 65536.0 | grad norm: 3941.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 70500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 09:56:24,138] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/mp_rank_00_model_states.pt [2021-11-25 09:56:24,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 09:56:24,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 09:56:24,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step70500/zero_pp_rank_14_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 70500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2603.79 iteration 70600/ 152972 | consumed samples: 31067584 | consumed tokens: 63626412032 | elapsed time per iteration (ms): 4665.3 | learning rate: 1.281E-04 | global batch size: 512 | lm loss: 1.512337E+00 | loss scale: 65536.0 | grad norm: 6431.370 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 70800/ 152972 | consumed samples: 31169984 | consumed tokens: 63836127232 | elapsed time per iteration (ms): 4658.5 | learning rate: 1.277E-04 | global batch size: 512 | lm loss: 1.530800E+00 | loss scale: 131072.0 | grad norm: 21867.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 71000/ 152972 | consumed samples: 31272384 | consumed tokens: 64045842432 | elapsed time per iteration (ms): 4648.8 | learning rate: 1.273E-04 | global batch size: 512 | lm loss: 1.494953E+00 | loss scale: 131072.0 | grad norm: 15511.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 71000 | lm loss value: 1.407345E+00 | lm loss PPL: 4.085094E+00 | ------------------------------------------------------------------------------------------- iteration 71200/ 152972 | consumed samples: 31374784 | consumed tokens: 64255557632 | elapsed time per iteration (ms): 5183.5 | learning rate: 1.269E-04 | global batch size: 512 | lm loss: 1.489749E+00 | loss scale: 32768.0 | grad norm: 5790.893 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 71400/ 152972 | consumed samples: 31477184 | consumed tokens: 64465272832 | elapsed time per iteration (ms): 4670.4 | learning rate: 1.265E-04 | global batch size: 512 | lm loss: 1.493322E+00 | loss scale: 32768.0 | grad norm: 3290.012 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 71600/ 152972 | consumed samples: 31579584 | consumed tokens: 64674988032 | elapsed time per iteration (ms): 4673.1 | learning rate: 1.261E-04 | global batch size: 512 | lm loss: 1.472841E+00 | loss scale: 32768.0 | grad norm: 3127.691 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 71800/ 152972 | consumed samples: 31681984 | consumed tokens: 64884703232 | elapsed time per iteration (ms): 4641.0 | learning rate: 1.257E-04 | global batch size: 512 | lm loss: 1.504507E+00 | loss scale: 65536.0 | grad norm: 5879.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 11:54:35,704] [INFO] [logging.py:68:log_dist] [Rank 0] step=72000, skipped=149, lr=[0.00012524958661119887, 0.00012524958661119887], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 72000/ 152972 | consumed samples: 31784384 | consumed tokens: 65094418432 | elapsed time per iteration (ms): 4645.8 | learning rate: 1.252E-04 | global batch size: 512 | lm loss: 1.548501E+00 | loss scale: 65536.0 | grad norm: 5884.682 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 72000 loss: 1.1994 iter time (s): 0.002 samples/sec: 220428.227 ------------------------------------------------------------------------------------------- valid loss at iteration 72000 | lm loss value: 1.491381E+00 | lm loss PPL: 4.443227E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 72000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 11:56:26,282] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/mp_rank_00_model_states.pt [2021-11-25 11:56:26,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,719] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,721] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,722] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,747] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,748] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,751] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,753] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,767] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,769] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,770] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,778] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 11:56:26,785] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 11:56:26,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step72000/zero_pp_rank_1_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 72000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2607.37 iteration 72200/ 152972 | consumed samples: 31886784 | consumed tokens: 65304133632 | elapsed time per iteration (ms): 5211.9 | learning rate: 1.248E-04 | global batch size: 512 | lm loss: 1.470766E+00 | loss scale: 131072.0 | grad norm: 14021.744 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 72400/ 152972 | consumed samples: 31989184 | consumed tokens: 65513848832 | elapsed time per iteration (ms): 4658.1 | learning rate: 1.244E-04 | global batch size: 512 | lm loss: 1.456137E+00 | loss scale: 131072.0 | grad norm: 12951.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 72600/ 152972 | consumed samples: 32091584 | consumed tokens: 65723564032 | elapsed time per iteration (ms): 4641.6 | learning rate: 1.240E-04 | global batch size: 512 | lm loss: 1.497809E+00 | loss scale: 131072.0 | grad norm: 15342.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 72800/ 152972 | consumed samples: 32193984 | consumed tokens: 65933279232 | elapsed time per iteration (ms): 4639.5 | learning rate: 1.236E-04 | global batch size: 512 | lm loss: 1.494703E+00 | loss scale: 65536.0 | grad norm: 8077.934 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 73000/ 152972 | consumed samples: 32296384 | consumed tokens: 66142994432 | elapsed time per iteration (ms): 4647.0 | learning rate: 1.232E-04 | global batch size: 512 | lm loss: 1.532679E+00 | loss scale: 65536.0 | grad norm: 5942.615 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 73000 | lm loss value: 1.561029E+00 | lm loss PPL: 4.763722E+00 | ------------------------------------------------------------------------------------------- iteration 73200/ 152972 | consumed samples: 32398784 | consumed tokens: 66352709632 | elapsed time per iteration (ms): 5204.8 | learning rate: 1.228E-04 | global batch size: 512 | lm loss: 1.475368E+00 | loss scale: 131072.0 | grad norm: 11266.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 73400/ 152972 | consumed samples: 32501184 | consumed tokens: 66562424832 | elapsed time per iteration (ms): 4656.9 | learning rate: 1.224E-04 | global batch size: 512 | lm loss: 1.484997E+00 | loss scale: 65536.0 | grad norm: 6480.742 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 73500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 13:54:33,167] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/mp_rank_00_model_states.pt [2021-11-25 13:54:33,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 13:54:33,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 13:54:33,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step73500/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 73500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2592.11 iteration 73600/ 152972 | consumed samples: 32603584 | consumed tokens: 66772140032 | elapsed time per iteration (ms): 4653.6 | learning rate: 1.220E-04 | global batch size: 512 | lm loss: 1.465757E+00 | loss scale: 65536.0 | grad norm: 3341.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 73800/ 152972 | consumed samples: 32705984 | consumed tokens: 66981855232 | elapsed time per iteration (ms): 4647.3 | learning rate: 1.216E-04 | global batch size: 512 | lm loss: 1.524856E+00 | loss scale: 32768.0 | grad norm: 2721.344 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 14:33:17,157] [INFO] [logging.py:68:log_dist] [Rank 0] step=74000, skipped=154, lr=[0.00012115460896212103, 0.00012115460896212103], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 74000 loss: 1.2412 iter time (s): 0.002 samples/sec: 220273.778 iteration 74000/ 152972 | consumed samples: 32808384 | consumed tokens: 67191570432 | elapsed time per iteration (ms): 4646.6 | learning rate: 1.212E-04 | global batch size: 512 | lm loss: 1.483615E+00 | loss scale: 32768.0 | grad norm: 2823.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 74000 | lm loss value: 1.474615E+00 | lm loss PPL: 4.369353E+00 | ------------------------------------------------------------------------------------------- iteration 74200/ 152972 | consumed samples: 32910784 | consumed tokens: 67401285632 | elapsed time per iteration (ms): 5189.0 | learning rate: 1.207E-04 | global batch size: 512 | lm loss: 1.535925E+00 | loss scale: 8192.0 | grad norm: 1494.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 74400/ 152972 | consumed samples: 33013184 | consumed tokens: 67611000832 | elapsed time per iteration (ms): 4659.7 | learning rate: 1.203E-04 | global batch size: 512 | lm loss: 1.489320E+00 | loss scale: 8192.0 | grad norm: 1080.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 74600/ 152972 | consumed samples: 33115584 | consumed tokens: 67820716032 | elapsed time per iteration (ms): 4655.4 | learning rate: 1.199E-04 | global batch size: 512 | lm loss: 1.507036E+00 | loss scale: 8192.0 | grad norm: 2758.045 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 74800/ 152972 | consumed samples: 33217984 | consumed tokens: 68030431232 | elapsed time per iteration (ms): 4649.9 | learning rate: 1.195E-04 | global batch size: 512 | lm loss: 1.500013E+00 | loss scale: 16384.0 | grad norm: 1770.999 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 75000/ 152972 | consumed samples: 33320384 | consumed tokens: 68240146432 | elapsed time per iteration (ms): 4654.1 | learning rate: 1.191E-04 | global batch size: 512 | lm loss: 1.481998E+00 | loss scale: 16384.0 | grad norm: 964.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 75000 | lm loss value: 1.496127E+00 | lm loss PPL: 4.464366E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 75000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 15:54:29,837] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/mp_rank_00_model_states.pt [2021-11-25 15:54:30,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,275] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,333] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,333] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 15:54:30,351] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 15:54:30,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step75000/zero_pp_rank_15_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 75000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2555.85 iteration 75200/ 152972 | consumed samples: 33422784 | consumed tokens: 68449861632 | elapsed time per iteration (ms): 5224.6 | learning rate: 1.187E-04 | global batch size: 512 | lm loss: 1.524306E+00 | loss scale: 32768.0 | grad norm: 3086.874 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 75400/ 152972 | consumed samples: 33525184 | consumed tokens: 68659576832 | elapsed time per iteration (ms): 4645.4 | learning rate: 1.183E-04 | global batch size: 512 | lm loss: 1.500375E+00 | loss scale: 32768.0 | grad norm: 3240.400 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 75600/ 152972 | consumed samples: 33627584 | consumed tokens: 68869292032 | elapsed time per iteration (ms): 4652.4 | learning rate: 1.179E-04 | global batch size: 512 | lm loss: 1.460076E+00 | loss scale: 32768.0 | grad norm: 3564.710 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 75800/ 152972 | consumed samples: 33729984 | consumed tokens: 69079007232 | elapsed time per iteration (ms): 4652.3 | learning rate: 1.174E-04 | global batch size: 512 | lm loss: 1.486643E+00 | loss scale: 65536.0 | grad norm: 6280.947 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 17:12:04,205] [INFO] [logging.py:68:log_dist] [Rank 0] step=76000, skipped=156, lr=[0.00011702224667532497, 0.00011702224667532497], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 76000 loss: 1.7454 iter time (s): 0.002 samples/sec: 220159.353 iteration 76000/ 152972 | consumed samples: 33832384 | consumed tokens: 69288722432 | elapsed time per iteration (ms): 4652.4 | learning rate: 1.170E-04 | global batch size: 512 | lm loss: 1.460991E+00 | loss scale: 65536.0 | grad norm: 8278.453 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 76000 | lm loss value: 1.406685E+00 | lm loss PPL: 4.082400E+00 | ------------------------------------------------------------------------------------------- iteration 76200/ 152972 | consumed samples: 33934784 | consumed tokens: 69498437632 | elapsed time per iteration (ms): 5195.9 | learning rate: 1.166E-04 | global batch size: 512 | lm loss: 1.515475E+00 | loss scale: 131072.0 | grad norm: 7905.772 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 76400/ 152972 | consumed samples: 34037184 | consumed tokens: 69708152832 | elapsed time per iteration (ms): 4649.7 | learning rate: 1.162E-04 | global batch size: 512 | lm loss: 1.486129E+00 | loss scale: 131072.0 | grad norm: 12315.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 76500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 17:52:40,113] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/mp_rank_00_model_states.pt [2021-11-25 17:52:40,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,540] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 17:52:40,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 17:52:40,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step76500/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 76500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2575.67 iteration 76600/ 152972 | consumed samples: 34139584 | consumed tokens: 69917868032 | elapsed time per iteration (ms): 4664.0 | learning rate: 1.158E-04 | global batch size: 512 | lm loss: 1.501978E+00 | loss scale: 65536.0 | grad norm: 7611.607 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 76800/ 152972 | consumed samples: 34241984 | consumed tokens: 70127583232 | elapsed time per iteration (ms): 4647.1 | learning rate: 1.154E-04 | global batch size: 512 | lm loss: 1.528411E+00 | loss scale: 65536.0 | grad norm: 4125.379 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 77000/ 152972 | consumed samples: 34344384 | consumed tokens: 70337298432 | elapsed time per iteration (ms): 4640.3 | learning rate: 1.149E-04 | global batch size: 512 | lm loss: 1.531680E+00 | loss scale: 131072.0 | grad norm: 12829.645 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 77000 | lm loss value: 1.463746E+00 | lm loss PPL: 4.322122E+00 | ------------------------------------------------------------------------------------------- iteration 77200/ 152972 | consumed samples: 34446784 | consumed tokens: 70547013632 | elapsed time per iteration (ms): 5289.2 | learning rate: 1.145E-04 | global batch size: 512 | lm loss: 1.469191E+00 | loss scale: 65536.0 | grad norm: 5036.747 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 77400/ 152972 | consumed samples: 34549184 | consumed tokens: 70756728832 | elapsed time per iteration (ms): 4640.8 | learning rate: 1.141E-04 | global batch size: 512 | lm loss: 1.480313E+00 | loss scale: 32768.0 | grad norm: 3833.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 77600/ 152972 | consumed samples: 34651584 | consumed tokens: 70966444032 | elapsed time per iteration (ms): 4642.4 | learning rate: 1.137E-04 | global batch size: 512 | lm loss: 1.533694E+00 | loss scale: 32768.0 | grad norm: 1919.989 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 77800/ 152972 | consumed samples: 34753984 | consumed tokens: 71176159232 | elapsed time per iteration (ms): 4642.3 | learning rate: 1.133E-04 | global batch size: 512 | lm loss: 1.484447E+00 | loss scale: 32768.0 | grad norm: 3477.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 19:50:54,600] [INFO] [logging.py:68:log_dist] [Rank 0] step=78000, skipped=161, lr=[0.00011287287812300848, 0.00011287287812300848], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 78000 loss: 1.6072 iter time (s): 0.002 samples/sec: 220820.748 iteration 78000/ 152972 | consumed samples: 34856384 | consumed tokens: 71385874432 | elapsed time per iteration (ms): 4640.2 | learning rate: 1.129E-04 | global batch size: 512 | lm loss: 1.487332E+00 | loss scale: 65536.0 | grad norm: 6999.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 78000 | lm loss value: 1.447810E+00 | lm loss PPL: 4.253788E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 78000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 19:53:02,460] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/mp_rank_00_model_states.pt [2021-11-25 19:53:02,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,893] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,897] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,897] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,901] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,905] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,907] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,920] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,928] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 19:53:02,955] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,961] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,962] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,966] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 19:53:02,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78000/zero_pp_rank_13_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 78000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2747.00 iteration 78200/ 152972 | consumed samples: 34958784 | consumed tokens: 71595589632 | elapsed time per iteration (ms): 5294.6 | learning rate: 1.125E-04 | global batch size: 512 | lm loss: 1.510645E+00 | loss scale: 65536.0 | grad norm: 5610.810 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 78400/ 152972 | consumed samples: 35061184 | consumed tokens: 71805304832 | elapsed time per iteration (ms): 4655.1 | learning rate: 1.120E-04 | global batch size: 512 | lm loss: 1.483753E+00 | loss scale: 131072.0 | grad norm: 16549.933 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 78600/ 152972 | consumed samples: 35163584 | consumed tokens: 72015020032 | elapsed time per iteration (ms): 4642.6 | learning rate: 1.116E-04 | global batch size: 512 | lm loss: 1.459196E+00 | loss scale: 131072.0 | grad norm: 13634.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 78800/ 152972 | consumed samples: 35265984 | consumed tokens: 72224735232 | elapsed time per iteration (ms): 4641.4 | learning rate: 1.112E-04 | global batch size: 512 | lm loss: 1.494021E+00 | loss scale: 131072.0 | grad norm: 12317.104 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 78983 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 21:09:14,926] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/mp_rank_00_model_states.pt [2021-11-25 21:09:15,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,352] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,357] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,358] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,359] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,366] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,382] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,382] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 21:09:15,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 21:09:15,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step78983/zero_pp_rank_4_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 78983 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2625.73 [exiting program after 1190.0178006211916 minutes] datetime: 2021-11-25 21:09:15 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja .................................... ..................[OKAY][OKAY] ninja ninjaninjaninjaninja .................. [OKAY]...................................................... --------------------------------------------------[OKAY] [OKAY] ninjaninjaninjaninja .................................... .................................... [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- [OKAY] ----------------------------------------------------------------------------------------------------.................. --------------------------------------------------[OKAY]op name op name [OKAY]op name --------------------------------------------------................ -------------------------------------------------- installed --------------------------------------------------op name -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name ................................--------------------------------------------------op name ..op name................ op name compatible ................ installed ................-------------------------------------------------- installed op name ................op name ................ installed................installed................ ....installed installed compatible .. installedinstalled................ op name .. ................installed.. ..compatibleinstalled compatible compatible -------------------------------------------------- ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ installed.. ....compatible cpu_adamcompatiblecompatible-------------------------------------------------- ...............-------------------------------------------------- -------------------------------------------------- [YES] ..--------------------------------------------------compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- ..-------------------------------------------------- -------------------------------------------------- compatible op name-------------------------------------------------- op name op name................op name ................................................ installed installed installed installed.. .. .. ..compatible compatible compatible ...... [OKAY] cpu_adam ............... cpu_adamcpu_adamcpu_adam[YES] .................................... ............... [YES] [OKAY] [YES] cpu_adam-------------------------------------------------- cpu_adamcpu_adam ............................................. [YES][YES][YES] .................. [OKAY][OKAY][OKAY] --------------------------------------------------compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adamcpu_adam......cpu_adam ..............................[OKAY]............... [YES] cpu_adam ............... cpu_adam[YES]cpu_adam ..................... ...............fused_adam[OKAY] [YES]...... ............[OKAY] [OKAY][OKAY] cpu_adam ...............fused_adamfused_adamfused_adam ....................................... [YES][YES][YES] ......[YES]............ [OKAY][OKAY] [OKAY] [YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY]fused_adam [YES][YES]............. ............[YES] fused_adam [OKAY] ......[OKAY] ............. [OKAY] [YES] ...... [OKAY] fused_adam .............fused_adam fused_adam [YES] .............fused_adam ............. ...... [YES]............. [YES] [OKAY]...... ......[YES] [OKAY][OKAY]...... ............. [YES] ...... [OKAY]fused_adam fused_lambfused_adam fused_lamb ............. fused_adam............. ............. [YES].............[YES][YES] ...... ...... [YES][OKAY] ...... ...... [OKAY] fused_lamb[OKAY] ............. fused_lamb fused_lambfused_lamb ...... .......................... ............. [OKAY] [YES][YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam .............fused_adam............. fused_lamb [YES] [YES]............. ......[YES] ............. [OKAY]............[YES] [OKAY][OKAY]...... fused_lamb [OKAY]............. [OKAY][OKAY] fused_lambfused_lamb [YES] ..........................fused_lamb ...... .............[OKAY][YES][YES] [YES] ...... ...... ...... [OKAY] [OKAY] [OKAY] sparse_attnsparse_attnsparse_attnfused_adam .................................... [NO][NO][NO] .................................. [OKAY][OKAY][YES][OKAY] fused_lamb fused_lamb[YES]............. ...................[YES] [YES][OKAY]...... ......[OKAY]sparse_attn fused_lambfused_lamb .......................... [YES][YES] ............ sparse_attn[OKAY] [OKAY]sparse_attn............ sparse_attn ............sparse_attnsparse_attn sparse_attn [NO]........................ ............ .......[NO] [NO] [NO] [OKAY].............. [OKAY]............ [NO] ....... [OKAY] [NO]............ .......[NO] [OKAY]....... [OKAY] .......[OKAY] [OKAY] [OKAY] transformer......transformer ............transformer[OKAY]............ ............ [YES] [YES][YES]...... ............fused_lamb [OKAY] [OKAY].............[OKAY] sparse_attn ............transformer [NO]............ sparse_attn ....... [YES] [OKAY].................. sparse_attn[NO][OKAY] transformer sparse_attntransformertransformer sparse_attn........................ ............ ............[YES][NO] [YES] [NO]...... ....... ............. [OKAY] [OKAY] transformer transformer............transformertransformer [YES]........................ ............ [YES] ......[YES] [YES] [OKAY]............ ................... ............ [NO]stochastic_transformer [OKAY]........[YES] [OKAY][YES] [OKAY][OKAY] ...... [OKAY][OKAY] [OKAY] [YES]stochastic_transformer stochastic_transformer stochastic_transformer.. .[YES]......[YES] [YES][OKAY]...... ...... ...... [OKAY] [OKAY] [OKAY] transformer...... ......[OKAY]............ [OKAY] transformer stochastic_transformertransformerstochastic_transformer transformer .. ............ ............[YES][YES][YES] ............[YES]...... [OKAY][OKAY]......[OKAY] [OKAY] stochastic_transformer .stochastic_transformer stochastic_transformer[YES]stochastic_transformer . ....... . [YES] [OKAY] [YES][YES] sparse_attn ............ [NO] ....... [OKAY] [YES] stochastic_transformer.................. . [YES] [OKAY] [YES] ...... ......[OKAY] stochastic_transformer [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] .................. [OKAY][OKAY][OKAY] transformer ............ [YES] ...... [OKAY] . [YES] stochastic_transformer...... . [OKAY][YES] stochastic_transformer . [YES] ...... [OKAY] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] [OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name-------------------------------------------------- op name................ op name ................op nameinstalled ................installed.. ................ installed ..installed compatible .. .. compatiblecompatible-------------------------------------------------- compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES]cpu_adam cpu_adam cpu_adam ............... .................................... [YES] [OKAY] [YES][YES]...... ............[OKAY] [OKAY][OKAY]fused_adam ............. [YES] fused_adam...... [OKAY]............. fused_adamfused_adam[YES] fused_lamb...... ..........................[OKAY] ............. [YES][YES] [YES]...... ............fused_lamb[OKAY] [OKAY] [OKAY] ............. fused_lamb [YES] .............fused_lamb...... [YES].............[OKAY] ......[YES] sparse_attn[OKAY]...... -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ............[OKAY] [NO] ....... [OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja sparse_attn ............ [NO] transformer....... sparse_attn ............[OKAY] sparse_attn............[YES] transformer..................[NO] ............[NO][OKAY] ....... [YES] .......[OKAY]...... [OKAY][OKAY]stochastic_transformer .transformer transformer [YES]stochastic_transformer............ ...... ............. [YES] [OKAY] [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] stochastic_transformer . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op name op name op name ................op name................ ................installed................ installed ....installed installed compatiblecompatible .. .. -------------------------------------------------- --------------------------------------------------compatiblecompatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY]cpu_adam............... ............... [YES] ............... [YES] ...... [YES]fused_adam...... .............[OKAY] [OKAY]...... [YES] [OKAY]...... [OKAY]fused_adam fused_adam fused_lamb.......................... fused_adam [YES]............. [YES]...................[YES] [OKAY][YES]............ [OKAY] ...... [OKAY] fused_lamb [OKAY] ............. [YES]fused_lamb fused_lamb...... ............. ............. sparse_attn[OKAY][YES] [YES].................. ......[NO][OKAY] [OKAY]....... [OKAY] sparse_attn transformer............ ............[NO] [YES]....... ......sparse_attn[OKAY] [OKAY]sparse_attn -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ............ transformer ............ stochastic_transformer [NO][NO]............ ....... . .......[OKAY][YES][YES] [OKAY]............transformer -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report [OKAY] [OKAY] transformer -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ............ ............[YES] [YES]stochastic_transformer ...... ...... . [OKAY] [OKAY] [YES] ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ...... stochastic_transformer[OKAY]stochastic_transformer . .[YES] [YES]...... ......[OKAY] [OKAY] ninjaninjaninja ninja .................................... .................. .................. [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op name--------------------------------------------------op name op name................op name................ ................installedinstalled .................. installed .. compatible..installed compatiblecompatible-------------------------------------------------- --------------------------------------------------.. -------------------------------------------------- compatible -------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... ............... ...............[YES]cpu_adam [YES][YES]...... ............... ............[OKAY][YES] [OKAY] ...... [OKAY] [OKAY] fused_adam fused_adam............. ............. fused_adam[YES][YES]fused_adam ...................................... [OKAY][YES][YES][OKAY] ............ [OKAY][OKAY]fused_lambfused_lamb fused_lamb.......................... fused_lamb[YES][YES]............. ...... ...................[OKAY][YES] [OKAY] [YES] ...... ......[OKAY] [OKAY] sparse_attn sparse_attn............ ............[NO] sparse_attn.......sparse_attn [NO] ............[OKAY]................... [NO] [OKAY] [NO] transformer....... transformer................... [OKAY]............[YES][OKAY] ......[YES] transformer [OKAY] transformer...... ............ ............[OKAY][YES] stochastic_transformer[YES]...... stochastic_transformer .......[OKAY] .[YES][OKAY] [YES]...... stochastic_transformer [OKAY]....... stochastic_transformer [OKAY] [YES] . ......[YES] [OKAY]...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name op name ................................................ ................installedinstalledinstalled installed.... .. ..compatible compatible compatible compatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam ............... cpu_adam ..............................[YES] [YES] ..................... [YES] ...... [YES] [OKAY]...... [OKAY] ......[OKAY] [OKAY] fused_adamfused_adam .......................... [YES][YES]fused_adam ............fused_adam ............. .............[OKAY] [OKAY] [YES][YES] fused_lamb............ .............fused_lamb [OKAY] [OKAY].............[YES] [YES]fused_lamb...... fused_lamb...................[OKAY] .............[YES][OKAY] [YES]...... ......[OKAY] [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]sparse_attn .......sparse_attn ............[OKAY]............transformer [NO][NO]............ transformer.......[YES]....... ............[OKAY][OKAY]...... [YES][OKAY] ......transformer transformer[OKAY]............stochastic_transformer ............[YES] .stochastic_transformer [YES] ...... . [YES] ...... [OKAY][YES] ...... [OKAY]......[OKAY] stochastic_transformer[OKAY] . stochastic_transformer[YES] ....... [OKAY][YES] ...... [OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY] [OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op name op name................op name ................installed................................ installedinstalledinstalled.. ....compatible compatiblecompatible-------------------------------------------------- --------------------------------------------------..-------------------------------------------------- compatible --------------------------------------------------cpu_adam ............... cpu_adam[YES]cpu_adam .................................... [OKAY][YES][YES] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- cpu_adam ...... ...... [OKAY][OKAY] --------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- ...............fused_adam [YES]............. ......[YES] fused_adamfused_adam...... .............[OKAY].............[OKAY] -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja [YES] [YES]fused_lamb...... ...................[OKAY] [YES][OKAY] ...... fused_lamb [OKAY]............. fused_adamfused_lamb [YES].......................... ...... [YES] [YES] [OKAY] ...... sparse_attn ......[OKAY] ............[OKAY] [NO] ....... [OKAY] fused_lamb transformer............. sparse_attn ............ ............ sparse_attn[YES] [NO] ............ ...... [NO].......[OKAY] ....... [OKAY][YES][OKAY] stochastic_transformer ...... transformer .[OKAY]transformer [YES]........................ ......[YES][YES] [OKAY]............ [OKAY][OKAY] sparse_attn ............ stochastic_transformer[NO]stochastic_transformer ......... [OKAY][YES] [YES] ............transformer [OKAY]............[OKAY] [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninja ninja .................................... .................. .................. [OKAY][OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op nameop name ................................................ op name installedinstalledinstalled ...................... compatible compatiblecompatible installed ---------------------------------------------------------------------------------------------------- -------------------------------------------------- .. compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam ............................................. [YES][YES][YES]cpu_adam ................................. [OKAY][OKAY][OKAY][YES] ...... [OKAY] fused_adamfused_adamfused_adam ..........................fused_adam............. [YES] [YES][YES] ......................... [OKAY]......[OKAY] [YES] [OKAY]fused_lambfused_lamb ...... ..........................fused_lamb[OKAY] [YES]............. [YES][YES] .................. fused_lamb[OKAY] [OKAY][OKAY]............. [YES] ...... [OKAY] sparse_attn ............ sparse_attn[NO]sparse_attn ............ ....... ............ [NO] [OKAY]sparse_attn [NO] ....... ...................[OKAY]transformer [OKAY][NO]............ transformer[YES]....... transformer.................. ............[OKAY][OKAY][YES] [YES] ............ stochastic_transformer [OKAY] [OKAY] transformer. ............stochastic_transformer[YES]stochastic_transformer [YES]........ [YES][YES][OKAY] ...... ............ [OKAY][OKAY][OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninja ninja .................. .................................... .................. [OKAY] [OKAY] [OKAY] [OKAY]-------------------------------------------------- -------------------------------------------------- op name---------------------------------------------------------------------------------------------------- op name ................ op name op name ................installed................ installed..................installed .. installedcompatible .. compatible ..-------------------------------------------------- compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam cpu_adam............... cpu_adam...............cpu_adam [YES] [YES].................................... ...... [YES][YES][OKAY] [OKAY]............ [OKAY][OKAY] fused_adam fused_adam............. .............[YES] fused_adam[YES] fused_adam ...... ................... ............. [OKAY][YES] [OKAY] [YES]......fused_lamb fused_lamb................... [OKAY] [YES] ............. ......[OKAY]fused_lamb [YES] [OKAY] ...................fused_lamb [OKAY][YES]............. ......[YES] [OKAY]...... sparse_attn[OKAY] ............ [NO] ....... [OKAY]sparse_attn ............ transformer[NO]sparse_attn ............................... sparse_attn[YES] [OKAY] [NO].................. transformer[OKAY].......[NO] ............ [OKAY].......[YES] stochastic_transformer ......[OKAY].transformer [OKAY] [YES]............transformer ......[YES] stochastic_transformer ............[OKAY] ...... .[YES] [YES][OKAY] ...... ...... [OKAY][OKAY]stochastic_transformer . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name ................................op name installed................installed ................ .. installed ..installed compatible ....compatible-------------------------------------------------- compatiblecompatible-------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES]cpu_adam ..................... cpu_adam cpu_adam[OKAY] [YES] ............... .....................[YES] [YES][OKAY]...... ......[OKAY]fused_adam [OKAY]............. [YES] ......fused_adam [OKAY]............. fused_adam[YES]fused_adam ...................fused_lamb ............. [OKAY][YES]............. [YES] ......fused_lamb[YES]...... ...................[OKAY][OKAY] [YES][OKAY]fused_lamb fused_lamb................... .............[YES][OKAY] [YES]...... ......[OKAY] [OKAY]sparse_attn ............ [NO] ....... [OKAY] sparse_attn transformer............ ............[NO] sparse_attn [YES]sparse_attn ..................................... [OKAY][OKAY][NO] [NO] ..............stochastic_transformertransformer [OKAY].............[OKAY] [YES][YES]transformer transformer .................................... [OKAY][YES][YES][OKAY] ............ [OKAY]stochastic_transformer[OKAY] . [YES]stochastic_transformer stochastic_transformer...... ..[OKAY] [YES][YES] ............ [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op name ................ op name................................ installedinstalledinstalled................ .. .... installed compatiblecompatible compatible.. ------------------------------------------------------------------------------------------------------------------------------------------------------compatible -------------------------------------------------- cpu_adam cpu_adam...............cpu_adamcpu_adam [YES]............................................. ......[YES] [OKAY][YES]......[YES] ............ [OKAY] [OKAY][OKAY]fused_adam ............. [YES] ...... [OKAY] fused_adamfused_adamfused_adam fused_lamb.......................... ............. ............. [YES][YES] [YES][YES] .................. ...... [OKAY][OKAY] [OKAY] [OKAY] fused_lambfused_lamb .............fused_lamb............. [YES].............[YES] ......[YES]sparse_attn ...... [OKAY] .................. [OKAY] [NO][OKAY] ....... [OKAY] transformer ............ [YES] ...... sparse_attnsparse_attn[OKAY] sparse_attn ............ ............[NO] stochastic_transformer ............[NO]....... . [NO][OKAY].......[YES] .............[OKAY]transformer [OKAY] [OKAY] ............ transformer[YES] transformer ............ .................. [YES][OKAY][YES] ............ [OKAY][OKAY] stochastic_transformer . [YES]stochastic_transformerstochastic_transformer ........ [OKAY][YES][YES] ...... ......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ......................................................ninja [OKAY] ..................[OKAY][OKAY] [OKAY]---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op name op name................op name ................ ................ installedinstalled................installed .. installed..compatible.. ..compatible-------------------------------------------------- compatible compatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adam [OKAY] ............... cpu_adam............... [YES][YES] ............... ...... ......fused_adam [YES] [OKAY][OKAY]............. [YES]...... ...... [OKAY][OKAY]fused_adam fused_adam ..........................fused_lamb [YES][YES]............. ............fused_adam[YES] [OKAY] [OKAY]............. ...... fused_lamb[OKAY]fused_lamb[YES] .......................... ......[YES][YES] ......[OKAY]...... [OKAY][OKAY] sparse_attn ............fused_lamb [NO] .................... [OKAY] sparse_attn[YES]sparse_attntransformer .......................................... [NO][NO][YES] [OKAY] ....... ............. [OKAY][OKAY][OKAY] transformer transformer............ stochastic_transformer [YES] ............ ....... [YES]sparse_attn [YES][OKAY]...... ..................[OKAY] [OKAY]stochastic_transformer[NO] . .......stochastic_transformer[YES] [OKAY]....... [OKAY][YES] transformer...... [OKAY]............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninja .................................... ..................ninja [OKAY][OKAY] [OKAY] .................. ------------------------------------------------------------------------------------------------------------------------------------------------------[OKAY] op nameop name-------------------------------------------------- op name ................ ................ op name................ installed ................installed installed ..installed .. ..compatible..compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam..............................cpu_adam ............... [YES][YES] ............... [YES]...... [OKAY]............[YES] [OKAY][OKAY]...... [OKAY] fused_adam ............. [YES] fused_adam......fused_adam [OKAY]fused_adam ............. ............. fused_lamb .............[YES] [YES]............. ......[YES][YES]...... [OKAY]............[OKAY] [OKAY][OKAY]fused_lamb fused_lamb............. .............fused_lamb [YES] .............[YES]...... [YES][OKAY] ......sparse_attn ...... [OKAY]............[OKAY] [NO] ....... [OKAY] transformersparse_attn ........................ [YES] sparse_attn[NO]...... sparse_attn............ [OKAY]....... ............[NO][OKAY] stochastic_transformer[NO]....... transformer . ....... [OKAY][YES]............[OKAY] ...... [YES]transformertransformer [OKAY].............................. [YES][OKAY][YES] ............ stochastic_transformer[OKAY][OKAY] . [YES] ......stochastic_transformerstochastic_transformer [OKAY].. [YES][YES] ...... ......[OKAY] [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ninja................ installed ..................ninja.. [OKAY] ..................ninja compatible ..................--------------------------------------------------[OKAY] -------------------------------------------------- [OKAY] --------------------------------------------------op name --------------------------------------------------op name................ ................op namecpu_adam installed installed................ ................. .. installedcompatiblecompatible [YES] .. -------------------------------------------------- ...... --------------------------------------------------compatible [OKAY] -------------------------------------------------- cpu_adam ............... cpu_adam[YES] fused_adam ...... ............................ cpu_adam[OKAY] [YES][YES]............... ............[YES] [OKAY][OKAY]...... fused_adam [OKAY]............. [YES]fused_lamb ......fused_adam ............. [OKAY]fused_adam[YES]............. ...................[YES]fused_lamb [OKAY][YES]............. ...... ......[YES] [OKAY][OKAY]...... [OKAY] fused_lambfused_lamb ..........................sparse_attn [YES]............[YES] ......[NO] ...... sparse_attn[OKAY] ....... ............[OKAY] [OKAY] [NO] .......transformer [OKAY] ............ [YES] transformer...... sparse_attn............ sparse_attn [OKAY] ............ [YES] ............ [NO]stochastic_transformer ...... [NO]........ [OKAY].......[OKAY] [YES] [OKAY]...... stochastic_transformertransformer[OKAY] transformer............ .............[YES] [YES] ......[YES]...... [OKAY] ......[OKAY] [OKAY] stochastic_transformer .stochastic_transformer [YES] ....... [YES][OKAY] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [NO] ....... [OKAY][OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer .............. [NO] ....... [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.-------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io: please install the libaio-devel package with yumasync_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] async_io...... [OKAY]............... [NO] ....... quantizer[NO] .............. [NO] ....... [OKAY] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer ..............quantizer [NO] ..................... [NO][OKAY] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO]transformer_inference ......... [NO][NO] ....... [OKAY] utils .................. [YES]transformer_inference ........ [OKAY][NO] ....... [OKAY] quantizer .............. utils[NO] ......................... [YES][OKAY] ...... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.2 .................... torch cuda version1.8.2 ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']........... deepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']................... deepspeed info0.5.5+58a8e13, 58a8e13, master ...................deepspeed wheel compiled w. 0.5.5+58a8e13, 58a8e13, master...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']DeepSpeed general environment info: torch version .................... 1.8.2 torch install path torch cuda version............... ............... 11.1 nvcc version ..................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']11.2 deepspeed install path torch version........... ....................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 1.8.2 deepspeed info ...................torch cuda version 0.5.5+58a8e13, 58a8e13, master............... deepspeed wheel compiled w.11.1 ......nvcc version torch 1.8, cuda 11.1..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] transformer_inference .. [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found........ [OKAY] torch version .................... 1.8.2 utils .................. [YES] ...... [OKAY] quantizer .............. async_io[NO] ...................... [NO][OKAY] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ....... [NO] -------------------------------------------------- transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference ..transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] utils ..................utils [YES].................. ......[YES] [OKAY]...... [OKAY] quantizer quantizer.............. ..............[NO] [NO]....... .......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.2 .................... 1.8.2torch cuda version ............... torch cuda version11.1 ...............nvcc version 11.1..................... nvcc version11.2 .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']........... deepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']................... deepspeed info 0.5.5+58a8e13, 58a8e13, master................... deepspeed wheel compiled w.0.5.5+58a8e13, 58a8e13, master ......deepspeed wheel compiled w. torch 1.8, cuda 11.1...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ...........DeepSpeed general environment info: ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, mastertorch install path deepspeed wheel compiled w................ ...... torch 1.8, cuda 11.1 ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io async_io............... [NO]............... .......[NO] [NO]....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ...... quantizer[OKAY] .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY] -------------------------------------------------- --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inference transformer_inference.. ..[NO] [NO]....... ....... [OKAY][OKAY] utils .................. utils[YES] ........................ [YES][OKAY] ...... [OKAY] quantizer ..............quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version .................... torch version1.8.2 .................... torch cuda version1.8.2 ............... torch cuda version11.1 ...............nvcc version 11.1..................... 11.2nvcc version .....................deepspeed install path 11.2........... deepspeed install path ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']........... deepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']................... 0.5.5+58a8e13, 58a8e13, masterdeepspeed info ...................deepspeed wheel compiled w. 0.5.5+58a8e13, 58a8e13, master...... deepspeed wheel compiled w.torch 1.8, cuda 11.1 ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version torch version.................... ....................1.8.2 1.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.9486.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > setting tensorboard ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-25 21:09:49,032] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.355 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 23.993 seconds time to initialize megatron (seconds): 37.382 [after megatron is initialized] datetime: 2021-11-25 21:10:13 building GPT model ... [2021-11-25 21:10:13,429] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-25 21:10:13,430] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-25 21:10:13,430] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 38.82 GB, percent = 20.7% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-25 21:10:14,728] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-25 21:10:15,274] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-25 21:10:15,274] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-25 21:10:15,275] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.04 GB, percent = 20.8% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-25 21:10:15,294] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-25 21:10:15,629] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-25 21:10:15,629] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-25 21:10:15,629] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-25 21:10:15,633] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-25 21:10:15,633] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-25 21:10:15,633] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-25 21:10:15,633] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-25 21:10:15,633] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-25 21:10:15,633] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-25 21:10:15,633] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-25 21:10:17,215] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-25 21:10:17,216] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-25 21:10:17,216] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.85 GB, percent = 21.8% [2021-11-25 21:10:17,258] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-25 21:10:17,258] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-25 21:10:17,258] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.99 GB, percent = 21.9% [2021-11-25 21:10:17,258] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-25 21:10:17,284] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-25 21:10:17,284] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-25 21:10:17,285] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.99 GB, percent = 21.9% [2021-11-25 21:10:17,285] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-25 21:10:17,285] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-25 21:10:17,285] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-25 21:10:17,285] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-25 21:10:17,285] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] amp_params ................... False [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-25 21:10:17,285] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] dump_state ................... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] pld_params ................... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-25 21:10:17,286] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-25 21:10:17,287] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-25 21:10:17,287] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-25 21:10:17,287] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-25 21:10:17,316] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-25 21:10:17,316] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 44successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 52 successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 57 successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 30successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 3successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 1 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 18successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 26successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 14successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 42successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 40successfully loaded 32 ZeRO state_dicts for rank 43 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 1 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 27 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 53 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 78983 time (ms) | load-checkpoint: 11267.90 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-25 21:10:28 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 12.640091 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.167 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.346 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.065 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-25 21:10:47 done with setup ... training ... time (ms) | model-and-optimizer-setup: 15234.91 | train/valid/test-data-iterators-setup: 18645.43 Number of parameters: 1.423040512 billion Number of parameters: 1.42303232 billion Number of parameters without embeddings: 1.20860672 billion Number of parameters without embeddings: 1.208598528 billion [before the start of training step] datetime: 2021-11-25 21:10:47 [2021-11-25 21:10:47,793] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-25 21:10:47,793] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-25 21:10:47,794] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-25 21:10:47,794] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-25 21:10:47,794] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 0] (after 79000 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 [Rank 32] (after 79000 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 iteration 79000/ 152972 | consumed samples: 35368384 | consumed tokens: 72434450432 | elapsed time per iteration (ms): 4806.8 | learning rate: 1.108E-04 | global batch size: 512 | lm loss: 1.381355E+00 | loss scale: 262144.0 | grad norm: 23209.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 79000 | lm loss value: 1.494617E+00 | lm loss PPL: 4.457628E+00 | ------------------------------------------------------------------------------------------- iteration 79200/ 152972 | consumed samples: 35470784 | consumed tokens: 72644165632 | elapsed time per iteration (ms): 5498.3 | learning rate: 1.104E-04 | global batch size: 512 | lm loss: 1.479248E+00 | loss scale: 65536.0 | grad norm: 7491.694 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 79400/ 152972 | consumed samples: 35573184 | consumed tokens: 72853880832 | elapsed time per iteration (ms): 4657.7 | learning rate: 1.100E-04 | global batch size: 512 | lm loss: 1.444912E+00 | loss scale: 65536.0 | grad norm: 4777.535 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 79500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-25 21:54:00,266] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/mp_rank_00_model_states.pt [2021-11-25 21:54:00,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,743] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,743] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,744] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,747] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,748] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,751] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,753] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,753] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,762] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,767] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-25 21:54:00,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-25 21:54:00,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step79500/zero_pp_rank_9_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 79500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2841.52 iteration 79600/ 152972 | consumed samples: 35675584 | consumed tokens: 73063596032 | elapsed time per iteration (ms): 4830.2 | learning rate: 1.095E-04 | global batch size: 512 | lm loss: 1.491770E+00 | loss scale: 65536.0 | grad norm: 7161.998 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 79800/ 152972 | consumed samples: 35777984 | consumed tokens: 73273311232 | elapsed time per iteration (ms): 4775.8 | learning rate: 1.091E-04 | global batch size: 512 | lm loss: 1.469163E+00 | loss scale: 131072.0 | grad norm: 10393.959 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-25 22:34:29,408] [INFO] [logging.py:68:log_dist] [Rank 0] step=80000, skipped=165, lr=[0.00010870623370159022, 0.00010870623370159022], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 80000 loss: 1.5814 iter time (s): 0.002 samples/sec: 218743.373 iteration 80000/ 152972 | consumed samples: 35880384 | consumed tokens: 73483026432 | elapsed time per iteration (ms): 4938.5 | learning rate: 1.087E-04 | global batch size: 512 | lm loss: 1.502557E+00 | loss scale: 131072.0 | grad norm: 14039.424 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 80000 | lm loss value: 1.430774E+00 | lm loss PPL: 4.181936E+00 | ------------------------------------------------------------------------------------------- iteration 80200/ 152972 | consumed samples: 35982784 | consumed tokens: 73692741632 | elapsed time per iteration (ms): 5690.8 | learning rate: 1.083E-04 | global batch size: 512 | lm loss: 1.465081E+00 | loss scale: 131072.0 | grad norm: 13950.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 80400/ 152972 | consumed samples: 36085184 | consumed tokens: 73902456832 | elapsed time per iteration (ms): 5032.7 | learning rate: 1.079E-04 | global batch size: 512 | lm loss: 1.473477E+00 | loss scale: 131072.0 | grad norm: 11029.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 80600/ 152972 | consumed samples: 36187584 | consumed tokens: 74112172032 | elapsed time per iteration (ms): 4942.7 | learning rate: 1.075E-04 | global batch size: 512 | lm loss: 1.472850E+00 | loss scale: 131072.0 | grad norm: 13217.739 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 80800/ 152972 | consumed samples: 36289984 | consumed tokens: 74321887232 | elapsed time per iteration (ms): 4761.1 | learning rate: 1.070E-04 | global batch size: 512 | lm loss: 1.464116E+00 | loss scale: 131072.0 | grad norm: 16333.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 81000/ 152972 | consumed samples: 36392384 | consumed tokens: 74531602432 | elapsed time per iteration (ms): 4714.3 | learning rate: 1.066E-04 | global batch size: 512 | lm loss: 1.481708E+00 | loss scale: 131072.0 | grad norm: 16368.606 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 81000 | lm loss value: 1.432148E+00 | lm loss PPL: 4.187683E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 81000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 00:00:35,174] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/mp_rank_00_model_states.pt [2021-11-26 00:00:35,592] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 00:00:35,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 00:00:35,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step81000/zero_pp_rank_23_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 81000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2734.87 iteration 81200/ 152972 | consumed samples: 36494784 | consumed tokens: 74741317632 | elapsed time per iteration (ms): 5370.9 | learning rate: 1.062E-04 | global batch size: 512 | lm loss: 1.495950E+00 | loss scale: 65536.0 | grad norm: 7264.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 81400/ 152972 | consumed samples: 36597184 | consumed tokens: 74951032832 | elapsed time per iteration (ms): 4646.1 | learning rate: 1.058E-04 | global batch size: 512 | lm loss: 1.485108E+00 | loss scale: 65536.0 | grad norm: 8324.058 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 81600/ 152972 | consumed samples: 36699584 | consumed tokens: 75160748032 | elapsed time per iteration (ms): 4631.1 | learning rate: 1.054E-04 | global batch size: 512 | lm loss: 1.488576E+00 | loss scale: 131072.0 | grad norm: 12816.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 81800/ 152972 | consumed samples: 36801984 | consumed tokens: 75370463232 | elapsed time per iteration (ms): 4637.8 | learning rate: 1.049E-04 | global batch size: 512 | lm loss: 1.462934E+00 | loss scale: 131072.0 | grad norm: 14486.399 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 01:18:03,487] [INFO] [logging.py:68:log_dist] [Rank 0] step=82000, skipped=168, lr=[0.00010453034167022062, 0.00010453034167022062], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 82000 loss: 2.2899 iter time (s): 0.002 samples/sec: 221451.423 iteration 82000/ 152972 | consumed samples: 36904384 | consumed tokens: 75580178432 | elapsed time per iteration (ms): 4642.8 | learning rate: 1.045E-04 | global batch size: 512 | lm loss: 1.459093E+00 | loss scale: 131072.0 | grad norm: 18661.588 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 82000 | lm loss value: 1.463018E+00 | lm loss PPL: 4.318974E+00 | ------------------------------------------------------------------------------------------- iteration 82200/ 152972 | consumed samples: 37006784 | consumed tokens: 75789893632 | elapsed time per iteration (ms): 5264.3 | learning rate: 1.041E-04 | global batch size: 512 | lm loss: 1.445537E+00 | loss scale: 262144.0 | grad norm: 27661.084 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 82400/ 152972 | consumed samples: 37109184 | consumed tokens: 75999608832 | elapsed time per iteration (ms): 4649.0 | learning rate: 1.037E-04 | global batch size: 512 | lm loss: 1.520405E+00 | loss scale: 262144.0 | grad norm: 24577.664 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 82500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 01:58:52,428] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/mp_rank_00_model_states.pt [2021-11-26 01:58:52,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,872] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,874] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,875] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,878] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,880] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,883] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,886] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,888] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,890] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,892] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,894] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,895] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,896] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,896] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,896] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,897] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,897] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,899] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,900] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,903] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 01:58:52,921] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 01:58:52,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step82500/zero_pp_rank_8_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 82500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2750.69 iteration 82600/ 152972 | consumed samples: 37211584 | consumed tokens: 76209324032 | elapsed time per iteration (ms): 4652.1 | learning rate: 1.033E-04 | global batch size: 512 | lm loss: 1.491284E+00 | loss scale: 65536.0 | grad norm: 5903.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 82800/ 152972 | consumed samples: 37313984 | consumed tokens: 76419039232 | elapsed time per iteration (ms): 4639.2 | learning rate: 1.029E-04 | global batch size: 512 | lm loss: 1.456051E+00 | loss scale: 65536.0 | grad norm: 5735.583 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 83000/ 152972 | consumed samples: 37416384 | consumed tokens: 76628754432 | elapsed time per iteration (ms): 4640.7 | learning rate: 1.024E-04 | global batch size: 512 | lm loss: 1.486128E+00 | loss scale: 65536.0 | grad norm: 4426.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 83000 | lm loss value: 1.451559E+00 | lm loss PPL: 4.269765E+00 | ------------------------------------------------------------------------------------------- iteration 83200/ 152972 | consumed samples: 37518784 | consumed tokens: 76838469632 | elapsed time per iteration (ms): 5226.2 | learning rate: 1.020E-04 | global batch size: 512 | lm loss: 1.440591E+00 | loss scale: 131072.0 | grad norm: 11717.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 83400/ 152972 | consumed samples: 37621184 | consumed tokens: 77048184832 | elapsed time per iteration (ms): 4647.2 | learning rate: 1.016E-04 | global batch size: 512 | lm loss: 1.440060E+00 | loss scale: 131072.0 | grad norm: 9793.601 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 83600/ 152972 | consumed samples: 37723584 | consumed tokens: 77257900032 | elapsed time per iteration (ms): 4638.1 | learning rate: 1.012E-04 | global batch size: 512 | lm loss: 1.506807E+00 | loss scale: 65536.0 | grad norm: 7848.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 83800/ 152972 | consumed samples: 37825984 | consumed tokens: 77467615232 | elapsed time per iteration (ms): 4651.7 | learning rate: 1.008E-04 | global batch size: 512 | lm loss: 1.534896E+00 | loss scale: 65536.0 | grad norm: 6443.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 03:56:53,257] [INFO] [logging.py:68:log_dist] [Rank 0] step=84000, skipped=173, lr=[0.00010035953548602693, 0.00010035953548602693], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 84000 loss: 1.1764 iter time (s): 0.002 samples/sec: 219804.472 iteration 84000/ 152972 | consumed samples: 37928384 | consumed tokens: 77677330432 | elapsed time per iteration (ms): 4640.4 | learning rate: 1.004E-04 | global batch size: 512 | lm loss: 1.506379E+00 | loss scale: 131072.0 | grad norm: 12068.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 84000 | lm loss value: 1.503178E+00 | lm loss PPL: 4.495957E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 84000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 03:58:43,215] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/mp_rank_00_model_states.pt [2021-11-26 03:58:43,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,674] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 03:58:43,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 03:58:43,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step84000/zero_pp_rank_13_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 84000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2697.05 iteration 84200/ 152972 | consumed samples: 38030784 | consumed tokens: 77887045632 | elapsed time per iteration (ms): 5202.0 | learning rate: 9.994E-05 | global batch size: 512 | lm loss: 1.494838E+00 | loss scale: 131072.0 | grad norm: 15939.243 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 84400/ 152972 | consumed samples: 38133184 | consumed tokens: 78096760832 | elapsed time per iteration (ms): 4653.6 | learning rate: 9.952E-05 | global batch size: 512 | lm loss: 1.476911E+00 | loss scale: 131072.0 | grad norm: 9827.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 84600/ 152972 | consumed samples: 38235584 | consumed tokens: 78306476032 | elapsed time per iteration (ms): 4653.5 | learning rate: 9.911E-05 | global batch size: 512 | lm loss: 1.472570E+00 | loss scale: 262144.0 | grad norm: 34761.496 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 84800/ 152972 | consumed samples: 38337984 | consumed tokens: 78516191232 | elapsed time per iteration (ms): 4646.0 | learning rate: 9.870E-05 | global batch size: 512 | lm loss: 1.500965E+00 | loss scale: 65536.0 | grad norm: 7378.883 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 85000/ 152972 | consumed samples: 38440384 | consumed tokens: 78725906432 | elapsed time per iteration (ms): 4641.9 | learning rate: 9.828E-05 | global batch size: 512 | lm loss: 1.499766E+00 | loss scale: 65536.0 | grad norm: 6605.138 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 85000 | lm loss value: 1.524573E+00 | lm loss PPL: 4.593182E+00 | ------------------------------------------------------------------------------------------- iteration 85200/ 152972 | consumed samples: 38542784 | consumed tokens: 78935621632 | elapsed time per iteration (ms): 5175.0 | learning rate: 9.786E-05 | global batch size: 512 | lm loss: 1.473807E+00 | loss scale: 32768.0 | grad norm: 2865.179 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 85400/ 152972 | consumed samples: 38645184 | consumed tokens: 79145336832 | elapsed time per iteration (ms): 4636.0 | learning rate: 9.745E-05 | global batch size: 512 | lm loss: 1.496287E+00 | loss scale: 32768.0 | grad norm: 3540.007 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 85500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 05:56:40,853] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/mp_rank_00_model_states.pt [2021-11-26 05:56:41,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,280] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,287] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,330] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,333] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,335] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,348] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 05:56:41,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 05:56:41,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step85500/zero_pp_rank_13_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 85500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2699.35 iteration 85600/ 152972 | consumed samples: 38747584 | consumed tokens: 79355052032 | elapsed time per iteration (ms): 4649.0 | learning rate: 9.703E-05 | global batch size: 512 | lm loss: 1.489968E+00 | loss scale: 65536.0 | grad norm: 6438.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 85800/ 152972 | consumed samples: 38849984 | consumed tokens: 79564767232 | elapsed time per iteration (ms): 4648.9 | learning rate: 9.661E-05 | global batch size: 512 | lm loss: 1.458793E+00 | loss scale: 65536.0 | grad norm: 6518.303 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 06:35:22,392] [INFO] [logging.py:68:log_dist] [Rank 0] step=86000, skipped=178, lr=[9.619768024254082e-05, 9.619768024254082e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 86000 loss: 1.6078 iter time (s): 0.002 samples/sec: 220703.406 iteration 86000/ 152972 | consumed samples: 38952384 | consumed tokens: 79774482432 | elapsed time per iteration (ms): 4639.8 | learning rate: 9.620E-05 | global batch size: 512 | lm loss: 1.479524E+00 | loss scale: 65536.0 | grad norm: 7209.157 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 86000 | lm loss value: 1.488432E+00 | lm loss PPL: 4.430143E+00 | ------------------------------------------------------------------------------------------- iteration 86200/ 152972 | consumed samples: 39054784 | consumed tokens: 79984197632 | elapsed time per iteration (ms): 5182.1 | learning rate: 9.578E-05 | global batch size: 512 | lm loss: 1.469432E+00 | loss scale: 65536.0 | grad norm: 8272.405 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 86400/ 152972 | consumed samples: 39157184 | consumed tokens: 80193912832 | elapsed time per iteration (ms): 4647.7 | learning rate: 9.537E-05 | global batch size: 512 | lm loss: 1.468412E+00 | loss scale: 131072.0 | grad norm: 10167.714 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 86600/ 152972 | consumed samples: 39259584 | consumed tokens: 80403628032 | elapsed time per iteration (ms): 4639.9 | learning rate: 9.495E-05 | global batch size: 512 | lm loss: 1.492968E+00 | loss scale: 131072.0 | grad norm: 9084.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 86800/ 152972 | consumed samples: 39361984 | consumed tokens: 80613343232 | elapsed time per iteration (ms): 4633.3 | learning rate: 9.454E-05 | global batch size: 512 | lm loss: 1.486661E+00 | loss scale: 65536.0 | grad norm: 7473.034 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 87000/ 152972 | consumed samples: 39464384 | consumed tokens: 80823058432 | elapsed time per iteration (ms): 4637.4 | learning rate: 9.412E-05 | global batch size: 512 | lm loss: 1.441422E+00 | loss scale: 65536.0 | grad norm: 4885.592 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 87000 | lm loss value: 1.420729E+00 | lm loss PPL: 4.140137E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 87000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 07:56:21,232] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/mp_rank_00_model_states.pt [2021-11-26 07:56:21,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,721] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,723] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 07:56:21,724] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 07:56:21,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step87000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 87000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2763.52 iteration 87200/ 152972 | consumed samples: 39566784 | consumed tokens: 81032773632 | elapsed time per iteration (ms): 5205.7 | learning rate: 9.371E-05 | global batch size: 512 | lm loss: 1.447297E+00 | loss scale: 65536.0 | grad norm: 8030.938 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 87400/ 152972 | consumed samples: 39669184 | consumed tokens: 81242488832 | elapsed time per iteration (ms): 4627.7 | learning rate: 9.329E-05 | global batch size: 512 | lm loss: 1.540058E+00 | loss scale: 131072.0 | grad norm: 23037.125 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 87600/ 152972 | consumed samples: 39771584 | consumed tokens: 81452204032 | elapsed time per iteration (ms): 4630.7 | learning rate: 9.288E-05 | global batch size: 512 | lm loss: 1.500885E+00 | loss scale: 65536.0 | grad norm: 8065.074 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 87800/ 152972 | consumed samples: 39873984 | consumed tokens: 81661919232 | elapsed time per iteration (ms): 4639.1 | learning rate: 9.247E-05 | global batch size: 512 | lm loss: 1.446719E+00 | loss scale: 65536.0 | grad norm: 5238.787 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 09:13:38,887] [INFO] [logging.py:68:log_dist] [Rank 0] step=88000, skipped=182, lr=[9.205073166003089e-05, 9.205073166003089e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 88000 loss: 1.3111 iter time (s): 0.002 samples/sec: 219710.819 iteration 88000/ 152972 | consumed samples: 39976384 | consumed tokens: 81871634432 | elapsed time per iteration (ms): 4638.9 | learning rate: 9.205E-05 | global batch size: 512 | lm loss: 1.459576E+00 | loss scale: 131072.0 | grad norm: 12255.926 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 88000 | lm loss value: 1.424521E+00 | lm loss PPL: 4.155869E+00 | ------------------------------------------------------------------------------------------- iteration 88200/ 152972 | consumed samples: 40078784 | consumed tokens: 82081349632 | elapsed time per iteration (ms): 5185.2 | learning rate: 9.164E-05 | global batch size: 512 | lm loss: 1.472384E+00 | loss scale: 131072.0 | grad norm: 15221.051 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 88400/ 152972 | consumed samples: 40181184 | consumed tokens: 82291064832 | elapsed time per iteration (ms): 4641.4 | learning rate: 9.122E-05 | global batch size: 512 | lm loss: 1.450245E+00 | loss scale: 131072.0 | grad norm: 12082.784 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 88500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 09:54:10,742] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/mp_rank_00_model_states.pt [2021-11-26 09:54:11,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,212] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,219] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,220] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,225] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,229] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 09:54:11,231] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,235] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,237] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 09:54:11,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step88500/zero_pp_rank_31_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 88500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2725.94 iteration 88600/ 152972 | consumed samples: 40283584 | consumed tokens: 82500780032 | elapsed time per iteration (ms): 4656.4 | learning rate: 9.081E-05 | global batch size: 512 | lm loss: 1.511662E+00 | loss scale: 131072.0 | grad norm: 14254.072 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 88800/ 152972 | consumed samples: 40385984 | consumed tokens: 82710495232 | elapsed time per iteration (ms): 4643.1 | learning rate: 9.040E-05 | global batch size: 512 | lm loss: 1.450496E+00 | loss scale: 65536.0 | grad norm: 6452.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 89000/ 152972 | consumed samples: 40488384 | consumed tokens: 82920210432 | elapsed time per iteration (ms): 4652.8 | learning rate: 8.999E-05 | global batch size: 512 | lm loss: 1.468244E+00 | loss scale: 65536.0 | grad norm: 8136.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 89000 | lm loss value: 1.408505E+00 | lm loss PPL: 4.089835E+00 | ------------------------------------------------------------------------------------------- iteration 89200/ 152972 | consumed samples: 40590784 | consumed tokens: 83129925632 | elapsed time per iteration (ms): 5196.6 | learning rate: 8.958E-05 | global batch size: 512 | lm loss: 1.466055E+00 | loss scale: 65536.0 | grad norm: 10253.553 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 89400/ 152972 | consumed samples: 40693184 | consumed tokens: 83339640832 | elapsed time per iteration (ms): 4645.1 | learning rate: 8.916E-05 | global batch size: 512 | lm loss: 1.520631E+00 | loss scale: 131072.0 | grad norm: 16674.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 89600/ 152972 | consumed samples: 40795584 | consumed tokens: 83549356032 | elapsed time per iteration (ms): 4633.7 | learning rate: 8.875E-05 | global batch size: 512 | lm loss: 1.466066E+00 | loss scale: 131072.0 | grad norm: 14390.336 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 89800/ 152972 | consumed samples: 40897984 | consumed tokens: 83759071232 | elapsed time per iteration (ms): 4653.6 | learning rate: 8.834E-05 | global batch size: 512 | lm loss: 1.452053E+00 | loss scale: 131072.0 | grad norm: 11327.274 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 11:52:13,237] [INFO] [logging.py:68:log_dist] [Rank 0] step=90000, skipped=186, lr=[8.792878582063333e-05, 8.792878582063333e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 90000 loss: 0.9110 iter time (s): 0.002 samples/sec: 220602.051 iteration 90000/ 152972 | consumed samples: 41000384 | consumed tokens: 83968786432 | elapsed time per iteration (ms): 4663.9 | learning rate: 8.793E-05 | global batch size: 512 | lm loss: 1.469563E+00 | loss scale: 262144.0 | grad norm: 19361.273 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 90000 | lm loss value: 1.536308E+00 | lm loss PPL: 4.647399E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 90000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 11:54:04,778] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/mp_rank_00_model_states.pt [2021-11-26 11:54:05,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,211] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,211] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,213] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,215] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,216] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,218] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,222] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,223] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,225] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,225] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,226] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,232] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,233] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,236] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,238] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,239] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,240] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,243] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,244] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,246] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,247] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,248] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,251] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,254] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,257] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,260] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,262] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,283] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 11:54:05,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 11:54:05,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step90000/zero_pp_rank_11_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 90000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2814.19 iteration 90200/ 152972 | consumed samples: 41102784 | consumed tokens: 84178501632 | elapsed time per iteration (ms): 5210.6 | learning rate: 8.752E-05 | global batch size: 512 | lm loss: 1.490737E+00 | loss scale: 131072.0 | grad norm: 9987.570 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 90400/ 152972 | consumed samples: 41205184 | consumed tokens: 84388216832 | elapsed time per iteration (ms): 4645.9 | learning rate: 8.711E-05 | global batch size: 512 | lm loss: 1.491468E+00 | loss scale: 131072.0 | grad norm: 11903.332 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 90600/ 152972 | consumed samples: 41307584 | consumed tokens: 84597932032 | elapsed time per iteration (ms): 4640.9 | learning rate: 8.670E-05 | global batch size: 512 | lm loss: 1.467113E+00 | loss scale: 131072.0 | grad norm: 13633.733 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 90800/ 152972 | consumed samples: 41409984 | consumed tokens: 84807647232 | elapsed time per iteration (ms): 4674.5 | learning rate: 8.629E-05 | global batch size: 512 | lm loss: 1.446134E+00 | loss scale: 262144.0 | grad norm: 28938.068 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 91000/ 152972 | consumed samples: 41512384 | consumed tokens: 85017362432 | elapsed time per iteration (ms): 4642.3 | learning rate: 8.588E-05 | global batch size: 512 | lm loss: 1.457133E+00 | loss scale: 131072.0 | grad norm: 6701.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 91000 | lm loss value: 1.451088E+00 | lm loss PPL: 4.267754E+00 | ------------------------------------------------------------------------------------------- iteration 91200/ 152972 | consumed samples: 41614784 | consumed tokens: 85227077632 | elapsed time per iteration (ms): 5193.8 | learning rate: 8.548E-05 | global batch size: 512 | lm loss: 1.466079E+00 | loss scale: 65536.0 | grad norm: 6527.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 91400/ 152972 | consumed samples: 41717184 | consumed tokens: 85436792832 | elapsed time per iteration (ms): 4659.2 | learning rate: 8.507E-05 | global batch size: 512 | lm loss: 1.447940E+00 | loss scale: 65536.0 | grad norm: 8142.241 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 91500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 13:52:12,016] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/mp_rank_00_model_states.pt [2021-11-26 13:52:12,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,448] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,501] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 13:52:12,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 13:52:12,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step91500/zero_pp_rank_0_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 91500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2708.14 iteration 91600/ 152972 | consumed samples: 41819584 | consumed tokens: 85646508032 | elapsed time per iteration (ms): 4644.3 | learning rate: 8.466E-05 | global batch size: 512 | lm loss: 1.483770E+00 | loss scale: 32768.0 | grad norm: 4141.200 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 91800/ 152972 | consumed samples: 41921984 | consumed tokens: 85856223232 | elapsed time per iteration (ms): 4627.4 | learning rate: 8.425E-05 | global batch size: 512 | lm loss: 1.465101E+00 | loss scale: 32768.0 | grad norm: 3878.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 14:30:46,671] [INFO] [logging.py:68:log_dist] [Rank 0] step=92000, skipped=192, lr=[8.384387951392326e-05, 8.384387951392326e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 92000 loss: 1.5498 iter time (s): 0.002 samples/sec: 219664.276 iteration 92000/ 152972 | consumed samples: 42024384 | consumed tokens: 86065938432 | elapsed time per iteration (ms): 4628.4 | learning rate: 8.384E-05 | global batch size: 512 | lm loss: 1.417543E+00 | loss scale: 65536.0 | grad norm: 7193.985 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 92000 | lm loss value: 1.423522E+00 | lm loss PPL: 4.151719E+00 | ------------------------------------------------------------------------------------------- iteration 92200/ 152972 | consumed samples: 42126784 | consumed tokens: 86275653632 | elapsed time per iteration (ms): 5184.1 | learning rate: 8.344E-05 | global batch size: 512 | lm loss: 1.487518E+00 | loss scale: 65536.0 | grad norm: 7088.350 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 92400/ 152972 | consumed samples: 42229184 | consumed tokens: 86485368832 | elapsed time per iteration (ms): 4644.9 | learning rate: 8.303E-05 | global batch size: 512 | lm loss: 1.421809E+00 | loss scale: 65536.0 | grad norm: 8671.586 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 92600/ 152972 | consumed samples: 42331584 | consumed tokens: 86695084032 | elapsed time per iteration (ms): 4637.7 | learning rate: 8.263E-05 | global batch size: 512 | lm loss: 1.405928E+00 | loss scale: 32768.0 | grad norm: 2753.829 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 92800/ 152972 | consumed samples: 42433984 | consumed tokens: 86904799232 | elapsed time per iteration (ms): 4637.6 | learning rate: 8.222E-05 | global batch size: 512 | lm loss: 1.493733E+00 | loss scale: 32768.0 | grad norm: 3095.154 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 93000/ 152972 | consumed samples: 42536384 | consumed tokens: 87114514432 | elapsed time per iteration (ms): 4643.7 | learning rate: 8.181E-05 | global batch size: 512 | lm loss: 1.429464E+00 | loss scale: 32768.0 | grad norm: 2951.043 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 93000 | lm loss value: 1.478823E+00 | lm loss PPL: 4.387779E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 15:51:46,718] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/mp_rank_00_model_states.pt [2021-11-26 15:51:47,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,151] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,195] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 15:51:47,206] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 15:51:47,209] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93000/zero_pp_rank_1_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 93000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2768.87 iteration 93200/ 152972 | consumed samples: 42638784 | consumed tokens: 87324229632 | elapsed time per iteration (ms): 5214.7 | learning rate: 8.141E-05 | global batch size: 512 | lm loss: 1.456170E+00 | loss scale: 65536.0 | grad norm: 6013.702 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 93400/ 152972 | consumed samples: 42741184 | consumed tokens: 87533944832 | elapsed time per iteration (ms): 4645.9 | learning rate: 8.100E-05 | global batch size: 512 | lm loss: 1.457245E+00 | loss scale: 65536.0 | grad norm: 5967.587 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 93600/ 152972 | consumed samples: 42843584 | consumed tokens: 87743660032 | elapsed time per iteration (ms): 4643.0 | learning rate: 8.060E-05 | global batch size: 512 | lm loss: 1.416812E+00 | loss scale: 131072.0 | grad norm: 12605.921 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 93800/ 152972 | consumed samples: 42945984 | consumed tokens: 87953375232 | elapsed time per iteration (ms): 4646.2 | learning rate: 8.020E-05 | global batch size: 512 | lm loss: 1.452070E+00 | loss scale: 131072.0 | grad norm: 14279.357 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 93876 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 16:59:41,370] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/mp_rank_00_model_states.pt [2021-11-26 16:59:41,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,829] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,829] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,856] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,864] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,866] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,869] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,870] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,871] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,873] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,874] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,874] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,874] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,877] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,879] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,883] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,884] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,894] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 16:59:41,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,927] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 16:59:41,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step93876/zero_pp_rank_22_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 93876 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2780.37 [exiting program after 1190.0559914032617 minutes] datetime: 2021-11-26 16:59:42 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [YES][YES] ............ [OKAY][OKAY] fused_lambfused_lamb .......................... [YES][YES] ............ [OKAY][OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................ op nameop name ................ installed ................ ................installed.. installed installed..compatible ..--------------------------------------------------.. compatible sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] compatiblecompatible ---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] transformertransformer ........................ [YES][YES] ............ [OKAY][OKAY] cpu_adamcpu_adamcpu_adam ............................................. fused_adam [YES] [YES]............. [YES] ...... [YES]...... ...... [OKAY]......[OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] [OKAY][OKAY] fused_lamb .............fused_adam fused_adam[YES] ................................ fused_adam [YES] [OKAY][YES] ............. ...... ...... [YES] [OKAY] [OKAY] ...... [OKAY]fused_lamb fused_lamb............. .............fused_lambsparse_attn[YES] [YES] ..................................... [YES][NO][OKAY][OKAY] ............. [OKAY][OKAY] transformer ............ [YES] ......sparse_attn sparse_attn [OKAY] ............ ............sparse_attn [NO] [NO] stochastic_transformer ........................... [OKAY][YES][NO] [OKAY] ......transformer....... transformer [OKAY]............ [OKAY][YES] ............ ......[YES]transformer [OKAY].................. [OKAY][YES] ......stochastic_transformer [OKAY]stochastic_transformer. .[YES] stochastic_transformer[YES]...... .......[OKAY] [OKAY][YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report--------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ...................................................... .................. [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop name op nameop name ................ ................ ................................ installed installedinstalled..installed ......compatible ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ninjaninjaninjaninja .................................... .................. .................. [OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name compatible-------------------------------------------------- compatiblecompatible ------------------------------------------------------------------------------------------------------------------------------------------------------ [OKAY]---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name-------------------------------------------------- op name................ ................ ................op nameinstalledinstalled ................ installed.. .. installedcompatible..compatible --------------------------------------------------compatible-------------------------------------------------- .. -------------------------------------------------- compatible op nameop nameop name................ ................ ................installed ................ installed..installed ..compatible..installed compatible cpu_adam ............... [YES]cpu_adam cpu_adam...... cpu_adam ............... ...............[OKAY] ............... --------------------------------------------------..compatible-------------------------------------------------- [YES][YES][YES] .................. fused_adam[OKAY][OKAY][OKAY] ............. --------------------------------------------------cpu_adamcpu_adam ...............cpu_adam............... [YES]...............[YES] ......[YES]...... cpu_adam[OKAY][OKAY]...... ...............[OKAY] compatible -------------------------------------------------- -------------------------------------------------- [YES] ...... [OKAY] [YES] ...... fused_adamfused_adam[OKAY] fused_adam.......................... cpu_adamcpu_adam .............................. cpu_adam[YES]cpu_adam [YES] .......................................... [OKAY][OKAY][YES][YES] fused_adam fused_adam............. fused_lambfused_adam ............. [YES] ............. .............[YES]...... [YES][OKAY][YES]...... ............[OKAY] [OKAY]fused_lamb[OKAY] .............[YES][YES] [YES]............ ......[OKAY] [OKAY] fused_adam[OKAY] ............ [OKAY][OKAY] .............fused_lamb fused_adam ............. [YES]fused_adam ................... fused_adam[OKAY]fused_adam [YES] [YES]............. fused_lamb ......[YES] ............. [OKAY] ......[YES]sparse_attn [OKAY].................. ............. fused_lambfused_lamb fused_lamb[YES]............. ............. ............. [YES]...... [YES] ......[YES] [OKAY] [OKAY]............ [OKAY][OKAY] fused_lamb ............. [YES] ...... [OKAY] .......................... [YES]......fused_lamb[YES] ......[OKAY]................... sparse_attn ............ sparse_attnsparse_attn[NO] ............................... [OKAY][NO][NO] .............. transformer [OKAY] [OKAY] sparse_attn............ [OKAY][OKAY][YES] fused_lamb [NO][OKAY] ....... [OKAY] ...... .............fused_lamb[OKAY] fused_lamb sparse_attn ............ transformer[NO] ...................sparse_attn [OKAY][YES] transformer [YES] transformer ........................ ...... [YES]............ [NO] [OKAY][YES] ...... [YES] ............. ............. ...... [YES][YES] [OKAY]............ [OKAY][OKAY] ............ sparse_attn ...... [NO]transformer [OKAY]................... [NO] ............[OKAY] [YES]....... stochastic_transformer......transformer[OKAY] .............[OKAY] [OKAY]stochastic_transformer [OKAY]. sparse_attn ............ [NO] ....... [OKAY] [OKAY]............. stochastic_transformer stochastic_transformer[YES]. transformer....... [OKAY][YES]............[YES] [YES]............ [OKAY]......[OKAY] [OKAY] transformer[YES][YES] ..................stochastic_transformer...... [YES] .[OKAY] [OKAY] ......[YES] ......[OKAY]stochastic_transformer [OKAY]. transformersparse_attn sparse_attnsparse_attn........................ [YES]........................[NO] [NO]...... .......[NO] .......[OKAY] ....... [OKAY][OKAY][OKAY] [YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] stochastic_transformertransformer transformer.transformer ............ [YES]............ ............ [YES] ...... [YES] [YES]...... [OKAY] [OKAY]............ [OKAY][OKAY] stochastic_transformerstochastic_transformerstochastic_transformer ... [YES] [YES][YES] ...... ...... ......[OKAY][OKAY] [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ninjaninjaninjaninja .................. .................................... ..................[OKAY] op nameop nameop name op name ................................................................ installedinstalledinstalledinstalled ...... .. compatible compatible compatiblecompatible -------------------------------------------------- -------------------------------------------------- [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- --------------------------------------------------op name op name op name................................op name installedinstalled................ ................ .... installed compatibleinstalledcompatible ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- .. ---------------------------------------------------------------------------------------------------- compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam ................................................... [YES][YES][OKAY][YES] op nameop name .................. [OKAY][OKAY][OKAY] op name................................ ................op nameinstalled installedinstalled................ .. ....installed compatiblecompatiblecompatible.. ------------------------------------------------------------------------------------------------------------------------------------------------------ compatible cpu_adam ...............cpu_adam cpu_adam [YES]............... cpu_adam ..................... [YES] [YES]............... [OKAY] ......[YES]...... [OKAY]......[OKAY] -------------------------------------------------- fused_adam [OKAY]............. [YES] ...... fused_adam[OKAY] fused_adam ............. [YES] ......fused_adam fused_adam[OKAY] ..........................fused_adam [YES][YES]fused_lamb ............. ...... ................... [YES] [OKAY] [OKAY]......[YES] cpu_adamcpu_adam cpu_adam ...............cpu_adam............... [YES] .............................. [YES]...... [YES]...... [OKAY] [YES] [OKAY]...... .............fused_adam fused_adam [YES]fused_lamb............. ...................[YES]............. [OKAY] ......[OKAY]fused_lamb [OKAY]fused_lamb............. ......[YES][YES] fused_lamb............[OKAY] [OKAY][OKAY]............. fused_lamb.............[YES] .............[YES] ...... [YES] ......[OKAY]...... sparse_attn[OKAY][OKAY] ............ ......[OKAY] [OKAY] fused_lamb[YES] fused_lamb................... .............[YES][OKAY] [NO] ....... [OKAY] fused_adam fused_adam............. fused_adam ............. fused_adam ............. [YES][YES]............. ............ [YES][YES] [OKAY] [OKAY]...... [YES]...... ......sparse_attn[OKAY] [OKAY]............ ...... [OKAY][OKAY] fused_lambfused_lamb sparse_attntransformer ............sparse_attn............ [NO][YES]sparse_attn............ .........................[NO] [OKAY] [OKAY] [NO] ....... [OKAY] [NO] .............. transformer[OKAY][OKAY] stochastic_transformer ..........................fused_lamb [YES]fused_lamb [YES] ................... ................... [OKAY][YES] [YES] [OKAY] ...... ...... [OKAY][OKAY] sparse_attntransformer ........................ [NO]sparse_attn[YES] sparse_attn ................... ...... ............[OKAY][OKAY] [NO] ............. transformertransformer[YES][YES] .................................... [OKAY][YES] [YES] ......[OKAY] [OKAY]...... sparse_attn ............ sparse_attn[NO]sparse_attnsparse_attn ............................... ............[OKAY] [NO] .............. transformer[OKAY]stochastic_transformer [OKAY].............transformer [OKAY] [YES][YES]............ transformer ...... ...... [YES] [OKAY] ............[OKAY] ...... [YES] [OKAY]...... stochastic_transformer[OKAY] [NO][NO] [NO].......transformer ..........................[OKAY] [OKAY][YES] stochastic_transformer . stochastic_transformerstochastic_transformer[YES] ........ [OKAY][YES][YES] ............ [OKAY] stochastic_transformer. stochastic_transformer[YES]. ....... [YES] [YES] [OKAY] [OKAY] [OKAY] transformer...... transformer [OKAY]transformer............ ............ [OKAY][OKAY] ............ ............stochastic_transformer[YES][YES] .......[YES]...... [OKAY]...... [YES][OKAY] stochastic_transformer[OKAY]...... .[OKAY]stochastic_transformer [YES]stochastic_transformer ........ [YES] [YES][OKAY] ............ [OKAY][OKAY] ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop name ................op name ................ ................installed ................installed ..installedinstalled .. compatible..compatible.. --------------------------------------------------compatiblecompatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... cpu_adam[YES] ..................... cpu_adam[YES]cpu_adam[OKAY] ..................... [OKAY][YES] ...... [OKAY] fused_adam............... .............[YES]fused_adam ......[YES]............. fused_adam[OKAY] ......[YES]............. [OKAY][YES] ...... ......[OKAY] [OKAY]fused_lamb fused_adam fused_lamb..........................fused_lamb [YES] .............[YES]............. ......[YES][YES]...... ......[OKAY]...... [OKAY][OKAY] [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attnsparse_attn sparse_attn ............ ............ ............ [NO] [NO] [NO] ....... ....... ....... [OKAY] [OKAY] [OKAY]sparse_attn ninjaninjaninjaninja ...................................................... .................. [OKAY] transformer............transformertransformer ............ [NO] ............[YES]............ [YES].......[YES]...... ......[OKAY]...... [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- [OKAY][OKAY] transformer --------------------------------------------------op name-------------------------------------------------- op name stochastic_transformer............ stochastic_transformerstochastic_transformer. [YES] . [YES]. ...... [YES] ......[OKAY][YES]...... ................op nameop name ................ installed ................................ installed.. installedinstalled ..compatible .. [OKAY] ......[OKAY] [OKAY] ..compatible -------------------------------------------------- compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- stochastic_transformer . [YES] ...... [OKAY] cpu_adam ............... cpu_adam[YES] cpu_adamcpu_adam..................... ............... [OKAY]...............[YES][YES] ......[YES]...... [OKAY]......[OKAY] fused_adam [OKAY]............. [YES] ......fused_adam [OKAY]............. fused_adam[YES] fused_adamfused_lamb...... ............. .............[OKAY] ............. [YES] [YES] [YES] fused_lamb...... ...... ...................[OKAY][OKAY] [OKAY][YES] ...... fused_lamb[OKAY] fused_lamb............. .............[YES] [YES]...... ......sparse_attn[OKAY] [OKAY] ............sparse_attn [NO]............ .......[NO] [OKAY] .......transformer sparse_attn [OKAY] ............ sparse_attn............ transformer[YES]............ [NO] ...... [NO]................... [OKAY] [YES] ....... [OKAY]......[OKAY] stochastic_transformer[OKAY]transformer ............. transformerstochastic_transformer[YES] [YES]............. ...... ......[YES] [YES] [OKAY][OKAY] ...... ......[OKAY] stochastic_transformer[OKAY] . [YES] ......stochastic_transformer [OKAY]. [YES] ...... [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[YES] [YES]...... ...... [OKAY][OKAY] fused_lambfused_lamb .......................... [YES][YES] ............ [OKAY][OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY] [OKAY] transformer transformer............ ............[YES] [YES]...... ......[OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja .................. .................................... .................. [OKAY][OKAY] [OKAY][OKAY]---------------------------------------------------------------------------------------------------- --------------------------------------------------op nameop name-------------------------------------------------- ................op name................ op name installed installed .................................... installedcompatiblecompatible installed --------------------------------------------------..-------------------------------------------------- .. compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adam cpu_adam............... ...............[YES] [YES]cpu_adam......cpu_adam ...... [OKAY]............... ............... [OKAY] [YES] [YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [YES]............. ......[YES] fused_adam [OKAY]fused_adam ...... ............. .............[OKAY] fused_lamb [YES] [YES]fused_lamb............. ...... ................... [YES] [OKAY][OKAY][YES]...... [OKAY]......fused_lamb [OKAY]fused_lamb ............. .............[YES] [YES] ............ [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... [OKAY]............ [NO] .......transformer [OKAY]............sparse_attn sparse_attn [YES] transformer.............................. [NO] ................... [OKAY][NO][YES] [OKAY] ....... ...... stochastic_transformer[OKAY][OKAY] transformer . transformer............[YES] stochastic_transformer............[YES]...... .[YES][OKAY] ...... [YES] ............[OKAY] [OKAY][OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninjaninjaninja .................................... .................. ..................[OKAY] [OKAY] [OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- --------------------------------------------------op nameop name op name op name ................ ................................................ installed ..installedinstalledinstalled compatible.. .... -------------------------------------------------- compatiblecompatible compatible ---------------------------------------------------------------------------------------------------- --------------------------------------------------cpu_adam ............... [YES] ...... [OKAY]cpu_adam cpu_adam cpu_adam ............... ............... ............... [YES]fused_adam [YES][YES] ...... ...... .............[OKAY] ......[OKAY] [YES] [OKAY] ...... [OKAY] fused_adam .............fused_adam fused_lamb [YES]fused_adam ................... .......................... [YES][OKAY][YES] [YES]...... ...... [OKAY] ...... fused_lamb[OKAY] [OKAY]fused_lamb............. [YES]............. ......fused_lamb[YES] [OKAY]................... sparse_attn [YES] [OKAY] ............ ...... [NO][OKAY] ....... [OKAY]sparse_attn ............ [NO] transformer....... ............sparse_attn[OKAY] sparse_attn[YES]............ transformer.................. [NO] ............ [NO][OKAY].......[YES] ....... [OKAY]...... stochastic_transformer [OKAY] [OKAY]transformer. ............[YES]transformer stochastic_transformer......[YES]............ . ...... [OKAY] [YES] [OKAY][YES] ............ [OKAY][OKAY]stochastic_transformer . stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] .......transformer_inference [NO].. [NO] ....... [OKAY] utils .................. [YES] ......transformer_inference [OKAY].. [NO] .......quantizer [OKAY].............. [NO] ....... [OKAY] utils .................. [YES]-------------------------------------------------- ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install path ...............torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']torch version ....................torch version 1.8.2.................... 1.8.2torch cuda version ...............torch cuda version 11.1............... nvcc version11.1 .....................nvcc version 11.2..................... deepspeed install path11.2 ...........deepspeed install path ...........['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed info deepspeed info................... ...................0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, masterdeepspeed wheel compiled w. deepspeed wheel compiled w....... ......torch 1.8, cuda 11.1 torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 DeepSpeed general environment info: deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info:  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] .................... 1.8.2torch version ....................torch cuda version 1.8.2............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ...........deepspeed info ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']................... 0.5.5+58a8e13, 58a8e13, masterdeepspeed info ...................deepspeed wheel compiled w. 0.5.5+58a8e13, 58a8e13, master...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]async_io ............... [NO] ....... [NO] transformer_inference .. [NO] transformer_inference....... ..[OKAY] [NO] ....... [OKAY]utils .................. [YES] ...... utils[OKAY] .................. [YES] quantizer...... ..............[OKAY] [NO] ....... [OKAY] quantizer .............. [NO]-------------------------------------------------- ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO]async_io ...................... [NO] [NO] ....... [NO] transformer_inference .. [NO]transformer_inference ......... [OKAY][NO] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... quantizer[OKAY] .............. [NO] .......quantizer [OKAY].............. [NO] ....... --------------------------------------------------[OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda versionDeepSpeed general environment info: ............... 11.1 nvcc version ..................... torch install path11.2 deepspeed install path............... ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ...................['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.torch version .......................... torch 1.8, cuda 11.11.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path ............... ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.2 1.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > setting tensorboard ... ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninja ninja.................................... .................................... [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- op nameop name op nameop name................ ................................................installed installedinstalledinstalled.. compatible .... ..-------------------------------------------------- compatible compatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ......cpu_adam cpu_adam [OKAY] ...............cpu_adam ............... [YES] .....................[YES] fused_adam[YES][OKAY]...... ...................[OKAY] [YES][OKAY] ...... [OKAY] fused_adam fused_adamfused_lamb............. .............fused_adam[YES]............. [YES]...................[YES] [OKAY]......[YES]...... [OKAY] ......[OKAY] [OKAY]fused_lambfused_lamb ..........................fused_lamb [YES].............[YES] ......[YES] ...... sparse_attn[OKAY] ...... ............[OKAY] [OKAY][NO] ....... [OKAY] transformer ............ [YES] sparse_attn...... sparse_attn ............sparse_attn [OKAY] [NO] ............ ...................[NO] stochastic_transformer[OKAY][NO]....... . ....... [OKAY]transformer [YES] [OKAY] ............ ...... transformer[YES] transformer............[OKAY]...... ............[YES][OKAY] [YES]...... ......[OKAY] stochastic_transformer[OKAY] . stochastic_transformer[YES] stochastic_transformer....... .[OKAY][YES] [YES]...... ......[OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop nameop name op name ................ ................................ ................ installed installedinstalledinstalled.. .... .. compatible compatiblecompatible compatible -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- cpu_adamcpu_adam cpu_adam cpu_adam............... ............... ............... [YES][YES]...............[YES] ............[YES] ......[OKAY] [OKAY] ......[OKAY] [OKAY] fused_adamfused_adamfused_adam fused_adam ....................................... [YES].............[YES][YES] ...... [YES] ............[OKAY] [OKAY] [OKAY]...... [OKAY] fused_lambfused_lambfused_lamb .............fused_lamb............. .............[YES] [YES].............[YES] ...... [YES] ............[OKAY] [OKAY][OKAY]...... [OKAY] sparse_attnsparse_attnsparse_attnsparse_attn .................................... ............ [NO] [NO] [NO][NO] ....... ....... ..............[OKAY][OKAY] [OKAY][OKAY] transformer transformer ............transformer ............transformer [YES] ..............................[YES] [YES] [YES][OKAY] .................. [OKAY]stochastic_transformer[OKAY][OKAY] . [YES]stochastic_transformer stochastic_transformerstochastic_transformer ...... . .. [OKAY] [YES][YES][YES] .................. [OKAY][OKAY][OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report---------------------------------------------------------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. ......................................................[OKAY] [OKAY][OKAY][OKAY] -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op nameop name op nameop name ................................................ ................ installed installedinstalled installed ...... .. compatiblecompatible compatible compatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam...............[YES] ..............................[YES]...... [YES]......[YES] [OKAY]......[OKAY]...... [OKAY][OKAY] fused_adam fused_adamfused_adamfused_adam ............. ............. .......................... [YES] [YES] [YES] ......[YES] ...... ...... [OKAY]...... [OKAY][OKAY] [OKAY]fused_lamb .............fused_lamb fused_lambfused_lamb[YES]............. ...................[YES]............. ......[YES][OKAY] [YES] ...... [OKAY]...... [OKAY][OKAY] sparse_attn ............ [NO] sparse_attn....... ............[OKAY] sparse_attnsparse_attn[NO] ........................ ....... transformer[NO] [NO] [OKAY] .............. ............ [OKAY][OKAY]transformer [YES] .................. [YES]transformertransformer[OKAY] .................. ............ [YES][OKAY][YES] ............stochastic_transformer [OKAY]stochastic_transformer [OKAY] . .[YES] [YES]...... stochastic_transformerstochastic_transformer ...... [OKAY]. . [OKAY][YES] [YES] ............ [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']  [WARNING]  async_io: please install the libaio-devel package with yum torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.24149.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-26 17:00:28,780] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.365 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 11.631 seconds time to initialize megatron (seconds): 40.905 [after megatron is initialized] datetime: 2021-11-26 17:00:40 building GPT model ... [2021-11-26 17:00:40,945] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-26 17:00:40,946] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-26 17:00:40,946] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.34 GB, percent = 21.5% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-26 17:00:42,252] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-26 17:00:42,804] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-26 17:00:42,804] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-26 17:00:42,805] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.57 GB, percent = 21.7% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-26 17:00:42,824] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-26 17:00:43,133] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-26 17:00:43,134] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-26 17:00:43,134] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-26 17:00:43,138] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-26 17:00:43,138] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-26 17:00:43,138] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-26 17:00:43,138] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-26 17:00:43,138] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-26 17:00:43,138] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-26 17:00:43,138] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-26 17:00:44,723] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-26 17:00:44,724] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-26 17:00:44,724] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.39 GB, percent = 22.6% [2021-11-26 17:00:44,759] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-26 17:00:44,759] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-26 17:00:44,760] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.51 GB, percent = 22.7% [2021-11-26 17:00:44,760] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-26 17:00:44,787] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-26 17:00:44,788] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-26 17:00:44,788] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.52 GB, percent = 22.7% [2021-11-26 17:00:44,788] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-26 17:00:44,788] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-26 17:00:44,788] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-26 17:00:44,788] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-26 17:00:44,788] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-26 17:00:44,788] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-26 17:00:44,788] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-26 17:00:44,788] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-26 17:00:44,788] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-26 17:00:44,788] [INFO] [config.py:944:print] amp_params ................... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] dump_state ................... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-26 17:00:44,789] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] pld_params ................... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-26 17:00:44,790] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-26 17:00:44,790] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-26 17:00:44,791] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-26 17:00:44,819] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-26 17:00:44,819] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 52 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 42successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 43 successfully loaded 32 ZeRO state_dicts for rank 57successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 3successfully loaded 32 ZeRO state_dicts for rank 1 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 4successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 14 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 8successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 11successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 30successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 17successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 18 successfully loaded 32 ZeRO state_dicts for rank 16 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 1 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 5 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 12 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 93876 time (ms) | load-checkpoint: 14095.71 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-26 17:00:58 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.090975 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.093 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.250 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.075 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-26 17:01:10 done with setup ... training ... time (ms) | model-and-optimizer-setup: 18043.40 | train/valid/test-data-iterators-setup: 11911.73 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.208598528 billion Number of parameters without embeddings: 1.20860672 billion [before the start of training step] datetime: 2021-11-26 17:01:10 [2021-11-26 17:01:11,002] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-26 17:01:11,002] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-26 17:01:11,002] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-26 17:01:11,002] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-26 17:01:11,002] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [2021-11-26 17:10:50,454] [INFO] [logging.py:68:log_dist] [Rank 0] step=94000, skipped=195, lr=[7.979369029973826e-05, 7.979369029973826e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 94000 loss: 1.0057 iter time (s): 0.002 samples/sec: 220777.886 [Rank 0] (after 94000 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 [Rank 32] (after 94000 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 iteration 94000/ 152972 | consumed samples: 43048384 | consumed tokens: 88163090432 | elapsed time per iteration (ms): 4674.8 | learning rate: 7.979E-05 | global batch size: 512 | lm loss: 1.436654E+00 | loss scale: 131072.0 | grad norm: 9644.238 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 94000 | lm loss value: 1.397260E+00 | lm loss PPL: 4.044104E+00 | ------------------------------------------------------------------------------------------- iteration 94200/ 152972 | consumed samples: 43150784 | consumed tokens: 88372805632 | elapsed time per iteration (ms): 5193.7 | learning rate: 7.939E-05 | global batch size: 512 | lm loss: 1.465592E+00 | loss scale: 131072.0 | grad norm: 14924.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 94400/ 152972 | consumed samples: 43253184 | consumed tokens: 88582520832 | elapsed time per iteration (ms): 4653.0 | learning rate: 7.899E-05 | global batch size: 512 | lm loss: 1.507431E+00 | loss scale: 131072.0 | grad norm: 13182.394 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 94500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 17:51:27,569] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/mp_rank_00_model_states.pt [2021-11-26 17:51:27,996] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,004] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,005] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,008] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,009] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,010] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,010] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,012] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,013] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,014] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,016] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,017] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,018] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,020] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,021] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,028] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,030] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,034] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,036] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,037] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,039] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,040] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,042] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,043] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,047] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,048] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,049] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,050] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,051] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,052] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,052] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,053] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,053] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,053] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,054] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,056] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,057] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,058] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,059] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,060] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,062] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,065] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 17:51:28,066] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,066] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,066] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 17:51:28,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step94500/zero_pp_rank_10_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 94500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2889.82 iteration 94600/ 152972 | consumed samples: 43355584 | consumed tokens: 88792236032 | elapsed time per iteration (ms): 4670.8 | learning rate: 7.859E-05 | global batch size: 512 | lm loss: 1.466808E+00 | loss scale: 262144.0 | grad norm: 29497.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 94800/ 152972 | consumed samples: 43457984 | consumed tokens: 89001951232 | elapsed time per iteration (ms): 4642.5 | learning rate: 7.819E-05 | global batch size: 512 | lm loss: 1.439583E+00 | loss scale: 65536.0 | grad norm: 5505.137 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 95000/ 152972 | consumed samples: 43560384 | consumed tokens: 89211666432 | elapsed time per iteration (ms): 4642.2 | learning rate: 7.779E-05 | global batch size: 512 | lm loss: 1.499571E+00 | loss scale: 65536.0 | grad norm: 5510.268 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 95000 | lm loss value: 1.416083E+00 | lm loss PPL: 4.120948E+00 | ------------------------------------------------------------------------------------------- iteration 95200/ 152972 | consumed samples: 43662784 | consumed tokens: 89421381632 | elapsed time per iteration (ms): 5171.5 | learning rate: 7.739E-05 | global batch size: 512 | lm loss: 1.447289E+00 | loss scale: 65536.0 | grad norm: 9471.090 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 95400/ 152972 | consumed samples: 43765184 | consumed tokens: 89631096832 | elapsed time per iteration (ms): 4635.6 | learning rate: 7.699E-05 | global batch size: 512 | lm loss: 1.459376E+00 | loss scale: 131072.0 | grad norm: 16320.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 95600/ 152972 | consumed samples: 43867584 | consumed tokens: 89840812032 | elapsed time per iteration (ms): 4647.6 | learning rate: 7.659E-05 | global batch size: 512 | lm loss: 1.431986E+00 | loss scale: 131072.0 | grad norm: 14788.197 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 95800/ 152972 | consumed samples: 43969984 | consumed tokens: 90050527232 | elapsed time per iteration (ms): 4641.5 | learning rate: 7.619E-05 | global batch size: 512 | lm loss: 1.447985E+00 | loss scale: 262144.0 | grad norm: 19614.140 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 19:49:15,276] [INFO] [logging.py:68:log_dist] [Rank 0] step=96000, skipped=200, lr=[7.579619930010632e-05, 7.579619930010632e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 96000 loss: 1.3746 iter time (s): 0.002 samples/sec: 221522.009 iteration 96000/ 152972 | consumed samples: 44072384 | consumed tokens: 90260242432 | elapsed time per iteration (ms): 4625.6 | learning rate: 7.580E-05 | global batch size: 512 | lm loss: 1.474051E+00 | loss scale: 131072.0 | grad norm: 12708.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 96000 | lm loss value: 1.418481E+00 | lm loss PPL: 4.130843E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 96000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 19:51:05,371] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/mp_rank_00_model_states.pt [2021-11-26 19:51:05,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,809] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,809] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,814] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,816] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,824] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,837] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,857] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,862] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 19:51:05,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 19:51:05,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step96000/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 96000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2793.07 iteration 96200/ 152972 | consumed samples: 44174784 | consumed tokens: 90469957632 | elapsed time per iteration (ms): 5193.1 | learning rate: 7.540E-05 | global batch size: 512 | lm loss: 1.488286E+00 | loss scale: 65536.0 | grad norm: 4492.078 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 96400/ 152972 | consumed samples: 44277184 | consumed tokens: 90679672832 | elapsed time per iteration (ms): 4640.1 | learning rate: 7.500E-05 | global batch size: 512 | lm loss: 1.477757E+00 | loss scale: 65536.0 | grad norm: 8009.700 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 96600/ 152972 | consumed samples: 44379584 | consumed tokens: 90889388032 | elapsed time per iteration (ms): 4644.6 | learning rate: 7.461E-05 | global batch size: 512 | lm loss: 1.455136E+00 | loss scale: 131072.0 | grad norm: 15379.709 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 96800/ 152972 | consumed samples: 44481984 | consumed tokens: 91099103232 | elapsed time per iteration (ms): 4635.1 | learning rate: 7.421E-05 | global batch size: 512 | lm loss: 1.416874E+00 | loss scale: 131072.0 | grad norm: 6307.451 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 97000/ 152972 | consumed samples: 44584384 | consumed tokens: 91308818432 | elapsed time per iteration (ms): 4642.7 | learning rate: 7.382E-05 | global batch size: 512 | lm loss: 1.468490E+00 | loss scale: 131072.0 | grad norm: 13210.931 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 97000 | lm loss value: 1.349474E+00 | lm loss PPL: 3.855396E+00 | ------------------------------------------------------------------------------------------- iteration 97200/ 152972 | consumed samples: 44686784 | consumed tokens: 91518533632 | elapsed time per iteration (ms): 5190.2 | learning rate: 7.342E-05 | global batch size: 512 | lm loss: 1.456204E+00 | loss scale: 262144.0 | grad norm: 18098.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 97400/ 152972 | consumed samples: 44789184 | consumed tokens: 91728248832 | elapsed time per iteration (ms): 4635.9 | learning rate: 7.303E-05 | global batch size: 512 | lm loss: 1.487451E+00 | loss scale: 65536.0 | grad norm: 4676.695 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 97500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 21:48:58,939] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/mp_rank_00_model_states.pt [2021-11-26 21:48:59,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,370] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,374] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,376] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,396] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,405] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,405] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,407] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,408] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,422] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 21:48:59,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 21:48:59,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step97500/zero_pp_rank_21_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 97500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2693.68 iteration 97600/ 152972 | consumed samples: 44891584 | consumed tokens: 91937964032 | elapsed time per iteration (ms): 4659.7 | learning rate: 7.264E-05 | global batch size: 512 | lm loss: 1.434719E+00 | loss scale: 65536.0 | grad norm: 10441.365 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 97800/ 152972 | consumed samples: 44993984 | consumed tokens: 92147679232 | elapsed time per iteration (ms): 4648.6 | learning rate: 7.225E-05 | global batch size: 512 | lm loss: 1.452609E+00 | loss scale: 65536.0 | grad norm: 8967.126 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-26 22:27:42,255] [INFO] [logging.py:68:log_dist] [Rank 0] step=98000, skipped=204, lr=[7.185307907113102e-05, 7.185307907113102e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 98000 loss: 1.5726 iter time (s): 0.002 samples/sec: 220387.565 iteration 98000/ 152972 | consumed samples: 45096384 | consumed tokens: 92357394432 | elapsed time per iteration (ms): 4644.9 | learning rate: 7.185E-05 | global batch size: 512 | lm loss: 1.451152E+00 | loss scale: 131072.0 | grad norm: 14914.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 98000 | lm loss value: 1.412284E+00 | lm loss PPL: 4.105323E+00 | ------------------------------------------------------------------------------------------- iteration 98200/ 152972 | consumed samples: 45198784 | consumed tokens: 92567109632 | elapsed time per iteration (ms): 5172.0 | learning rate: 7.146E-05 | global batch size: 512 | lm loss: 1.443396E+00 | loss scale: 131072.0 | grad norm: 11073.611 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 98400/ 152972 | consumed samples: 45301184 | consumed tokens: 92776824832 | elapsed time per iteration (ms): 4645.0 | learning rate: 7.107E-05 | global batch size: 512 | lm loss: 1.381671E+00 | loss scale: 131072.0 | grad norm: 9229.476 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 98600/ 152972 | consumed samples: 45403584 | consumed tokens: 92986540032 | elapsed time per iteration (ms): 4643.9 | learning rate: 7.068E-05 | global batch size: 512 | lm loss: 1.447674E+00 | loss scale: 65536.0 | grad norm: 8239.555 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 98800/ 152972 | consumed samples: 45505984 | consumed tokens: 93196255232 | elapsed time per iteration (ms): 4637.5 | learning rate: 7.029E-05 | global batch size: 512 | lm loss: 1.471367E+00 | loss scale: 65536.0 | grad norm: 5879.242 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 99000/ 152972 | consumed samples: 45608384 | consumed tokens: 93405970432 | elapsed time per iteration (ms): 4629.7 | learning rate: 6.991E-05 | global batch size: 512 | lm loss: 1.523957E+00 | loss scale: 131072.0 | grad norm: 22853.794 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | ------------------------------------------------------------------------------------------- valid loss at iteration 99000 | lm loss value: 1.364398E+00 | lm loss PPL: 3.913367E+00 | ------------------------------------------------------------------------------------------- saving checkpoint at iteration 99000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-26 23:48:38,112] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/mp_rank_00_model_states.pt [2021-11-26 23:48:38,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,592] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-26 23:48:38,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-26 23:48:38,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step99000/zero_pp_rank_30_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 99000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2697.12 iteration 99200/ 152972 | consumed samples: 45710784 | consumed tokens: 93615685632 | elapsed time per iteration (ms): 5206.6 | learning rate: 6.952E-05 | global batch size: 512 | lm loss: 1.478810E+00 | loss scale: 131072.0 | grad norm: 12565.490 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 99400/ 152972 | consumed samples: 45813184 | consumed tokens: 93825400832 | elapsed time per iteration (ms): 4649.5 | learning rate: 6.913E-05 | global batch size: 512 | lm loss: 1.465013E+00 | loss scale: 65536.0 | grad norm: 3504.410 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 99600/ 152972 | consumed samples: 45915584 | consumed tokens: 94035116032 | elapsed time per iteration (ms): 4637.9 | learning rate: 6.875E-05 | global batch size: 512 | lm loss: 1.473047E+00 | loss scale: 65536.0 | grad norm: 5043.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 99800/ 152972 | consumed samples: 46017984 | consumed tokens: 94244831232 | elapsed time per iteration (ms): 4634.1 | learning rate: 6.836E-05 | global batch size: 512 | lm loss: 1.450259E+00 | loss scale: 65536.0 | grad norm: 7100.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 01:06:03,554] [INFO] [logging.py:68:log_dist] [Rank 0] step=100000, skipped=209, lr=[6.797588584121581e-05, 6.797588584121581e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 100000 loss: 1.0341 iter time (s): 0.002 samples/sec: 219897.766 iteration 100000/ 152972 | consumed samples: 46120384 | consumed tokens: 94454546432 | elapsed time per iteration (ms): 4650.3 | learning rate: 6.798E-05 | global batch size: 512 | lm loss: 1.464874E+00 | loss scale: 131072.0 | grad norm: 11201.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 100000 | lm loss value: 1.429840E+00 | lm loss PPL: 4.178032E+00 | -------------------------------------------------------------------------------------------- iteration 100200/ 152972 | consumed samples: 46222784 | consumed tokens: 94664261632 | elapsed time per iteration (ms): 5171.1 | learning rate: 6.759E-05 | global batch size: 512 | lm loss: 1.430290E+00 | loss scale: 131072.0 | grad norm: 13135.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 100400/ 152972 | consumed samples: 46325184 | consumed tokens: 94873976832 | elapsed time per iteration (ms): 4638.1 | learning rate: 6.721E-05 | global batch size: 512 | lm loss: 1.399518E+00 | loss scale: 131072.0 | grad norm: 10806.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 100500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 01:46:32,124] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/mp_rank_00_model_states.pt [2021-11-27 01:46:32,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,581] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 01:46:32,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 01:46:32,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step100500/zero_pp_rank_22_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 100500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2828.21 iteration 100600/ 152972 | consumed samples: 46427584 | consumed tokens: 95083692032 | elapsed time per iteration (ms): 4657.6 | learning rate: 6.683E-05 | global batch size: 512 | lm loss: 1.474608E+00 | loss scale: 262144.0 | grad norm: 28542.175 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 100800/ 152972 | consumed samples: 46529984 | consumed tokens: 95293407232 | elapsed time per iteration (ms): 4643.2 | learning rate: 6.644E-05 | global batch size: 512 | lm loss: 1.454591E+00 | loss scale: 131072.0 | grad norm: 13315.719 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 101000/ 152972 | consumed samples: 46632384 | consumed tokens: 95503122432 | elapsed time per iteration (ms): 4629.0 | learning rate: 6.607E-05 | global batch size: 512 | lm loss: 1.453657E+00 | loss scale: 32768.0 | grad norm: 3113.927 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 101000 | lm loss value: 1.426318E+00 | lm loss PPL: 4.163340E+00 | -------------------------------------------------------------------------------------------- iteration 101200/ 152972 | consumed samples: 46734784 | consumed tokens: 95712837632 | elapsed time per iteration (ms): 5171.0 | learning rate: 6.569E-05 | global batch size: 512 | lm loss: 1.451619E+00 | loss scale: 32768.0 | grad norm: 3986.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 101400/ 152972 | consumed samples: 46837184 | consumed tokens: 95922552832 | elapsed time per iteration (ms): 4638.9 | learning rate: 6.530E-05 | global batch size: 512 | lm loss: 1.466128E+00 | loss scale: 32768.0 | grad norm: 4255.608 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 101600/ 152972 | consumed samples: 46939584 | consumed tokens: 96132268032 | elapsed time per iteration (ms): 4655.2 | learning rate: 6.493E-05 | global batch size: 512 | lm loss: 1.446348E+00 | loss scale: 65536.0 | grad norm: 9549.320 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 101800/ 152972 | consumed samples: 47041984 | consumed tokens: 96341983232 | elapsed time per iteration (ms): 4653.0 | learning rate: 6.455E-05 | global batch size: 512 | lm loss: 1.448537E+00 | loss scale: 65536.0 | grad norm: 8570.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 03:44:22,725] [INFO] [logging.py:68:log_dist] [Rank 0] step=102000, skipped=213, lr=[6.416821949895536e-05, 6.416821949895536e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 102000 loss: 1.4696 iter time (s): 0.002 samples/sec: 222361.132 iteration 102000/ 152972 | consumed samples: 47144384 | consumed tokens: 96551698432 | elapsed time per iteration (ms): 4638.6 | learning rate: 6.417E-05 | global batch size: 512 | lm loss: 1.450586E+00 | loss scale: 131072.0 | grad norm: 15104.055 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 102000 | lm loss value: 1.376134E+00 | lm loss PPL: 3.959565E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 102000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 03:46:12,689] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/mp_rank_00_model_states.pt [2021-11-27 03:46:13,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,144] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,148] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,148] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,167] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,201] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,204] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,205] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,210] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 03:46:13,211] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 03:46:13,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step102000/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 102000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2766.02 iteration 102200/ 152972 | consumed samples: 47246784 | consumed tokens: 96761413632 | elapsed time per iteration (ms): 5196.9 | learning rate: 6.379E-05 | global batch size: 512 | lm loss: 1.476519E+00 | loss scale: 131072.0 | grad norm: 12938.905 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 102400/ 152972 | consumed samples: 47349184 | consumed tokens: 96971128832 | elapsed time per iteration (ms): 4639.7 | learning rate: 6.341E-05 | global batch size: 512 | lm loss: 1.433609E+00 | loss scale: 131072.0 | grad norm: 10828.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 102600/ 152972 | consumed samples: 47451584 | consumed tokens: 97180844032 | elapsed time per iteration (ms): 4648.7 | learning rate: 6.304E-05 | global batch size: 512 | lm loss: 1.412398E+00 | loss scale: 262144.0 | grad norm: 25086.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 102800/ 152972 | consumed samples: 47553984 | consumed tokens: 97390559232 | elapsed time per iteration (ms): 4652.6 | learning rate: 6.267E-05 | global batch size: 512 | lm loss: 1.431883E+00 | loss scale: 131072.0 | grad norm: 12481.100 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 103000/ 152972 | consumed samples: 47656384 | consumed tokens: 97600274432 | elapsed time per iteration (ms): 4632.6 | learning rate: 6.229E-05 | global batch size: 512 | lm loss: 1.458573E+00 | loss scale: 131072.0 | grad norm: 20120.723 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 103000 | lm loss value: 1.434458E+00 | lm loss PPL: 4.197368E+00 | -------------------------------------------------------------------------------------------- iteration 103200/ 152972 | consumed samples: 47758784 | consumed tokens: 97809989632 | elapsed time per iteration (ms): 5175.0 | learning rate: 6.192E-05 | global batch size: 512 | lm loss: 1.424665E+00 | loss scale: 65536.0 | grad norm: 6229.322 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 103400/ 152972 | consumed samples: 47861184 | consumed tokens: 98019704832 | elapsed time per iteration (ms): 4635.1 | learning rate: 6.155E-05 | global batch size: 512 | lm loss: 1.439639E+00 | loss scale: 65536.0 | grad norm: 6627.725 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 103500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 05:44:06,239] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/mp_rank_00_model_states.pt [2021-11-27 05:44:06,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,715] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,721] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,747] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,748] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,748] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,753] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,764] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,764] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,764] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 05:44:06,768] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,768] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,768] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 05:44:06,769] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step103500/zero_pp_rank_30_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 103500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2770.23 iteration 103600/ 152972 | consumed samples: 47963584 | consumed tokens: 98229420032 | elapsed time per iteration (ms): 4652.3 | learning rate: 6.118E-05 | global batch size: 512 | lm loss: 1.453396E+00 | loss scale: 65536.0 | grad norm: 7082.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 103800/ 152972 | consumed samples: 48065984 | consumed tokens: 98439135232 | elapsed time per iteration (ms): 4630.6 | learning rate: 6.081E-05 | global batch size: 512 | lm loss: 1.481052E+00 | loss scale: 131072.0 | grad norm: 19526.260 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 06:22:44,152] [INFO] [logging.py:68:log_dist] [Rank 0] step=104000, skipped=217, lr=[6.043939209121121e-05, 6.043939209121121e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 104000 loss: 0.8824 iter time (s): 0.002 samples/sec: 221176.620 iteration 104000/ 152972 | consumed samples: 48168384 | consumed tokens: 98648850432 | elapsed time per iteration (ms): 4643.7 | learning rate: 6.044E-05 | global batch size: 512 | lm loss: 1.425528E+00 | loss scale: 131072.0 | grad norm: 10220.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 104000 | lm loss value: 1.473406E+00 | lm loss PPL: 4.364074E+00 | -------------------------------------------------------------------------------------------- iteration 104200/ 152972 | consumed samples: 48270784 | consumed tokens: 98858565632 | elapsed time per iteration (ms): 5169.0 | learning rate: 6.007E-05 | global batch size: 512 | lm loss: 1.500703E+00 | loss scale: 65536.0 | grad norm: 8209.802 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 104400/ 152972 | consumed samples: 48373184 | consumed tokens: 99068280832 | elapsed time per iteration (ms): 4626.2 | learning rate: 5.970E-05 | global batch size: 512 | lm loss: 1.476431E+00 | loss scale: 65536.0 | grad norm: 6806.013 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 104600/ 152972 | consumed samples: 48475584 | consumed tokens: 99277996032 | elapsed time per iteration (ms): 4631.5 | learning rate: 5.934E-05 | global batch size: 512 | lm loss: 1.466260E+00 | loss scale: 65536.0 | grad norm: 6227.666 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 104800/ 152972 | consumed samples: 48577984 | consumed tokens: 99487711232 | elapsed time per iteration (ms): 4632.6 | learning rate: 5.897E-05 | global batch size: 512 | lm loss: 1.471230E+00 | loss scale: 131072.0 | grad norm: 18039.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 105000/ 152972 | consumed samples: 48680384 | consumed tokens: 99697426432 | elapsed time per iteration (ms): 4626.5 | learning rate: 5.861E-05 | global batch size: 512 | lm loss: 1.488181E+00 | loss scale: 65536.0 | grad norm: 7416.858 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 105000 | lm loss value: 1.388958E+00 | lm loss PPL: 4.010671E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 105000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 07:43:32,354] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/mp_rank_00_model_states.pt [2021-11-27 07:43:32,796] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,800] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,812] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,814] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,817] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,818] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,831] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,841] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,848] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,850] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,852] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 07:43:32,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,855] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,858] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 07:43:32,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step105000/zero_pp_rank_11_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 105000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2646.79 iteration 105200/ 152972 | consumed samples: 48782784 | consumed tokens: 99907141632 | elapsed time per iteration (ms): 5202.7 | learning rate: 5.824E-05 | global batch size: 512 | lm loss: 1.477839E+00 | loss scale: 65536.0 | grad norm: 5557.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 105400/ 152972 | consumed samples: 48885184 | consumed tokens: 100116856832 | elapsed time per iteration (ms): 4640.2 | learning rate: 5.788E-05 | global batch size: 512 | lm loss: 1.449046E+00 | loss scale: 65536.0 | grad norm: 8869.372 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 105600/ 152972 | consumed samples: 48987584 | consumed tokens: 100326572032 | elapsed time per iteration (ms): 4643.3 | learning rate: 5.752E-05 | global batch size: 512 | lm loss: 1.442468E+00 | loss scale: 131072.0 | grad norm: 17602.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 105800/ 152972 | consumed samples: 49089984 | consumed tokens: 100536287232 | elapsed time per iteration (ms): 4646.0 | learning rate: 5.716E-05 | global batch size: 512 | lm loss: 1.436906E+00 | loss scale: 131072.0 | grad norm: 13271.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 09:00:54,995] [INFO] [logging.py:68:log_dist] [Rank 0] step=106000, skipped=220, lr=[5.679480102498666e-05, 5.679480102498666e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 106000/ 152972 | consumed samples: 49192384 | consumed tokens: 100746002432 | elapsed time per iteration (ms): 4636.3 | learning rate: 5.679E-05 | global batch size: 512 | lm loss: 1.450937E+00 | loss scale: 262144.0 | grad norm: 23240.486 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 106000 loss: 1.0861 iter time (s): 0.002 samples/sec: 220146.444 -------------------------------------------------------------------------------------------- valid loss at iteration 106000 | lm loss value: 1.450381E+00 | lm loss PPL: 4.264738E+00 | -------------------------------------------------------------------------------------------- iteration 106200/ 152972 | consumed samples: 49294784 | consumed tokens: 100955717632 | elapsed time per iteration (ms): 5186.3 | learning rate: 5.644E-05 | global batch size: 512 | lm loss: 1.425632E+00 | loss scale: 131072.0 | grad norm: 10670.824 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 106400/ 152972 | consumed samples: 49397184 | consumed tokens: 101165432832 | elapsed time per iteration (ms): 4655.3 | learning rate: 5.608E-05 | global batch size: 512 | lm loss: 1.412134E+00 | loss scale: 65536.0 | grad norm: 8288.688 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 106500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 09:41:29,892] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/mp_rank_00_model_states.pt [2021-11-27 09:41:30,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,338] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,351] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,352] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,352] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,354] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,364] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,365] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,366] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,371] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,372] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,375] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 09:41:30,403] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,404] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 09:41:30,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step106500/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 106500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2676.44 iteration 106600/ 152972 | consumed samples: 49499584 | consumed tokens: 101375148032 | elapsed time per iteration (ms): 4663.7 | learning rate: 5.572E-05 | global batch size: 512 | lm loss: 1.443292E+00 | loss scale: 65536.0 | grad norm: 5637.258 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 106800/ 152972 | consumed samples: 49601984 | consumed tokens: 101584863232 | elapsed time per iteration (ms): 4654.2 | learning rate: 5.537E-05 | global batch size: 512 | lm loss: 1.452164E+00 | loss scale: 65536.0 | grad norm: 2323.600 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 107000/ 152972 | consumed samples: 49704384 | consumed tokens: 101794578432 | elapsed time per iteration (ms): 4642.8 | learning rate: 5.501E-05 | global batch size: 512 | lm loss: 1.435959E+00 | loss scale: 131072.0 | grad norm: 13364.832 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 107000 | lm loss value: 1.344946E+00 | lm loss PPL: 3.837980E+00 | -------------------------------------------------------------------------------------------- iteration 107200/ 152972 | consumed samples: 49806784 | consumed tokens: 102004293632 | elapsed time per iteration (ms): 5177.5 | learning rate: 5.465E-05 | global batch size: 512 | lm loss: 1.421839E+00 | loss scale: 131072.0 | grad norm: 11207.419 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 107400/ 152972 | consumed samples: 49909184 | consumed tokens: 102214008832 | elapsed time per iteration (ms): 4629.7 | learning rate: 5.430E-05 | global batch size: 512 | lm loss: 1.469068E+00 | loss scale: 262144.0 | grad norm: 27396.133 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 107600/ 152972 | consumed samples: 50011584 | consumed tokens: 102423724032 | elapsed time per iteration (ms): 4642.6 | learning rate: 5.395E-05 | global batch size: 512 | lm loss: 1.501998E+00 | loss scale: 65536.0 | grad norm: 9313.869 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 107800/ 152972 | consumed samples: 50113984 | consumed tokens: 102633439232 | elapsed time per iteration (ms): 4625.8 | learning rate: 5.360E-05 | global batch size: 512 | lm loss: 1.399211E+00 | loss scale: 65536.0 | grad norm: 6361.415 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 11:39:18,380] [INFO] [logging.py:68:log_dist] [Rank 0] step=108000, skipped=226, lr=[5.324864073497269e-05, 5.324864073497269e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 108000 loss: 1.4070 iter time (s): 0.002 samples/sec: 221110.385 iteration 108000/ 152972 | consumed samples: 50216384 | consumed tokens: 102843154432 | elapsed time per iteration (ms): 4639.0 | learning rate: 5.325E-05 | global batch size: 512 | lm loss: 1.433393E+00 | loss scale: 65536.0 | grad norm: 7201.073 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 108000 | lm loss value: 1.516618E+00 | lm loss PPL: 4.556788E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 11:41:08,344] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/mp_rank_00_model_states.pt [2021-11-27 11:41:08,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,798] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,799] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,804] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,806] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,807] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,808] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,820] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,821] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,825] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,827] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,828] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,830] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,832] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,833] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,834] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,835] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,838] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,839] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,840] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,843] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,844] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,845] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,846] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,849] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,851] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,853] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,859] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 11:41:08,861] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 11:41:08,863] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108000/zero_pp_rank_31_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 108000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2617.82 iteration 108200/ 152972 | consumed samples: 50318784 | consumed tokens: 103052869632 | elapsed time per iteration (ms): 5193.7 | learning rate: 5.290E-05 | global batch size: 512 | lm loss: 1.370427E+00 | loss scale: 131072.0 | grad norm: 14447.002 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 108400/ 152972 | consumed samples: 50421184 | consumed tokens: 103262584832 | elapsed time per iteration (ms): 4667.5 | learning rate: 5.255E-05 | global batch size: 512 | lm loss: 1.407369E+00 | loss scale: 131072.0 | grad norm: 21776.168 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 108600/ 152972 | consumed samples: 50523584 | consumed tokens: 103472300032 | elapsed time per iteration (ms): 4679.1 | learning rate: 5.220E-05 | global batch size: 512 | lm loss: 1.398895E+00 | loss scale: 131072.0 | grad norm: 18816.108 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 108800/ 152972 | consumed samples: 50625984 | consumed tokens: 103682015232 | elapsed time per iteration (ms): 4668.2 | learning rate: 5.186E-05 | global batch size: 512 | lm loss: 1.450623E+00 | loss scale: 131072.0 | grad norm: 10544.240 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 108886 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 12:50:03,687] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/mp_rank_00_model_states.pt [2021-11-27 12:50:04,115] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,119] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,120] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,120] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,120] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,120] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,123] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,125] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,127] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,127] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,128] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,129] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,132] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,144] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,153] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,158] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,159] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,166] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,167] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,173] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 12:50:04,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 12:50:04,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step108886/zero_pp_rank_17_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 108886 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2621.30 [exiting program after 1190.0311678012213 minutes] datetime: 2021-11-27 12:50:04 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................. .................. .................. [OKAY][OKAY] [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------op nameop name op name................op name ................ ................installed................ installedinstalled .. installed .. ..compatible ..compatible compatiblecompatible-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adamcpu_adam cpu_adam ............... ............... ............... ...............[YES] [YES] [YES] [YES] ............ ...... ...... [OKAY][OKAY] [OKAY] [OKAY] fused_adam fused_adam............. fused_adam .............fused_adam [YES] .............[YES] ............. ......[YES][YES]...... ...... ......[OKAY] [OKAY] [OKAY] [OKAY] fused_lamb ............. fused_lambfused_lamb[YES] fused_lamb............. ............. ...... [YES]............. [YES] ......[OKAY] [YES] ............[OKAY] [OKAY][OKAY] sparse_attn ............ [NO]sparse_attn sparse_attnsparse_attn ............ ....... ............[NO]............ [OKAY] [NO]....... [NO] .......transformer[OKAY] ....... ............ [OKAY] [OKAY] transformer [YES] transformer.................. transformer ............[YES] [OKAY] .................. [YES] [YES] [OKAY] ...... stochastic_transformer...... [OKAY].[OKAY] stochastic_transformer[YES] .......stochastic_transformer stochastic_transformer [OKAY][YES]. . ......[YES][YES] [OKAY]............ [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.----------------------------------------------------------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- -------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja JIT compiled ops requires ninja JIT compiled ops requires ninja ninjaninjaninjaninja .................. .................................... ..................[OKAY] [OKAY][OKAY] [OKAY]-------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op nameop name op nameop name ................ ................ ................ ................ installedinstalled installed installed.. .. .. compatible..compatible compatible--------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]...............cpu_adam cpu_adam ...... [YES] ...............[OKAY]..................... [OKAY][YES] [YES] ............ [OKAY][OKAY] fused_adam .............fused_adam [YES]............. ......[YES]fused_adamfused_adam [OKAY]................................ [OKAY][YES][YES] fused_lamb ...... ...... ............. [OKAY] fused_lamb[OKAY] [YES] ................... [YES]fused_lambfused_lamb[OKAY] ................................ [OKAY][YES][YES] ............ [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............sparse_attntransformer sparse_attn [NO] ............ ............................... [YES][NO][OKAY] [NO]...... .............. transformer [OKAY][OKAY] ............[OKAY] [YES] transformertransformer...... stochastic_transformer ............[OKAY] ............ . [YES] [YES] [YES]stochastic_transformer ................... [OKAY][OKAY][YES][OKAY] ...... [OKAY]stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ninjaninjaninjaninja .................. .................. .................................... [OKAY] [OKAY][OKAY][OKAY]-------------------------------------------------- ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- op name................op name op name ................ installed................................ .. installedinstalled installed ..compatible ....compatible-------------------------------------------------- compatible -------------------------------------------------- compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... cpu_adam......[YES] ...............cpu_adam[OKAY]...... ...............[YES][OKAY] [YES]...... ......[OKAY] fused_adam fused_adam[OKAY] .......................... [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam ..........................fused_lambfused_lamb [YES].............[YES]............. ......[YES] [YES] [OKAY]...... ...... [OKAY]......[OKAY] fused_lamb [OKAY]............. fused_lamb[YES] ................... [YES] sparse_attn[OKAY]...... sparse_attn[OKAY]............ ............[NO] [NO]....... .......[OKAY] [OKAY] sparse_attntransformertransformer .................................... [YES][NO][YES]sparse_attn ............................... [OKAY][OKAY][OKAY][NO] ....... [OKAY]transformer stochastic_transformerstochastic_transformer............ .transformer.[YES] [YES] ............[YES] ...... ...... [YES] ......[OKAY] [OKAY]......[OKAY] [OKAY] stochastic_transformer .stochastic_transformer [YES]. ......[YES] ......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name................op nameop name ................ installed ................................installed ..installedinstalled.. compatible.. ..compatible--------------------------------------------------compatible --------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES] cpu_adam............... ...... cpu_adam [YES]...............[OKAY] ............... ......[YES][YES] [OKAY]...... ...... [OKAY]fused_adam[OKAY] ............. [YES] ......fused_adam [OKAY]............. fused_adamfused_adam[YES] fused_lamb................................ .............[OKAY][YES] [YES][YES] .................. fused_lamb [OKAY][OKAY] [OKAY]............. [YES]fused_lamb ................... fused_lamb [OKAY] [YES] .............sparse_attn ......[YES]............ ......[OKAY][NO] [OKAY]....... [OKAY]sparse_attn ............ transformer[NO] ................... [YES][OKAY] ......sparse_attn transformer[OKAY] sparse_attn ........................ [YES]stochastic_transformer............[NO] ...... . [NO]....... [OKAY] [YES] ....... [OKAY] ......[OKAY] stochastic_transformer[OKAY] transformer transformer . ............ ............ [YES] [YES][YES]...... ............[OKAY] [OKAY][OKAY] stochastic_transformer stochastic_transformer. . [YES][YES] ............ [OKAY][OKAY] -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------op name op nameop name op name................ ................ ................ ................ installed installedinstalled installed .. .. ....compatible compatiblecompatible--------------------------------------------------compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adamcpu_adam ............... ..................... ............... [YES][OKAY] [YES][YES] ...... ...... ......[OKAY][OKAY] [OKAY] fused_adam ............. [YES]fused_adam fused_adamfused_adam...... .......................................[OKAY] [YES] [YES][YES] ............ ......fused_lamb [OKAY] [OKAY][OKAY]............. fused_lamb[YES] ............. fused_lamb......fused_lamb [YES] [OKAY]............. ................... [YES][OKAY][YES] ............ [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn sparse_attnsparse_attntransformer............ ............[NO]........................ .......[YES][NO][NO] .......[OKAY]...... .......[OKAY][OKAY] transformer ............transformer[OKAY] [YES]stochastic_transformer ............ ...... transformer[OKAY].[YES] [YES].................. stochastic_transformer......[OKAY][YES] [OKAY]....... stochastic_transformer[YES][OKAY] ....... [OKAY][YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- ---------------------------------------------------------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninjaninjaninja .................................... ....................................[OKAY][OKAY] [OKAY][OKAY]-------------------------------------------------- -------------------------------------------------- ---------------------------------------------------------------------------------------------------- op nameop name op name................op name ................installed................ ................ installed .. installedinstalled .. compatible.. .. compatible compatible--------------------------------------------------compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam cpu_adam [YES]............... ............... ...............[YES]...... [YES] ......[YES] [OKAY][OKAY] ...... ...... [OKAY][OKAY] fused_adam fused_adam............. fused_adamfused_adam ............. [YES].............[YES] ............. ......[YES] ...... [YES] [OKAY]...... [OKAY] ...... [OKAY] fused_lamb [OKAY] .............fused_lamb fused_lambfused_lamb[YES]............. ...................[YES]............. [YES]......[YES][OKAY] ......[OKAY] ...... [OKAY][OKAY] sparse_attn sparse_attn............ sparse_attnsparse_attn............[NO] [NO]............................... .......[NO] [OKAY][NO][OKAY] .............. [OKAY][OKAY]transformer transformer............ transformertransformer[YES]............ ..................[YES]............ [YES][OKAY]......[YES] ...... [OKAY] ...... [OKAY] [OKAY]stochastic_transformer stochastic_transformerstochastic_transformer. stochastic_transformer..[YES] [YES].[YES]...... ...... [YES] ......[OKAY] [OKAY] [OKAY]...... [OKAY] ninjaninjaninjaninja ........................................................................ [OKAY] [OKAY][OKAY] --------------------------------------------------[OKAY] -------------------------------------------------- --------------------------------------------------op name -------------------------------------------------- op nameop name................ op nameinstalled................ ................ .. installed................ installed compatible .. installed .. -------------------------------------------------- ..compatible compatible compatible---------------------------------------------------------------------------------------------------- -------------------------------------------------- cpu_adam ............... cpu_adam[YES]cpu_adam cpu_adam ..................... ............... ............... [YES][OKAY][YES] [YES] ............ ...... [OKAY] [OKAY] [OKAY] fused_adam ............. [YES] ......fused_adam .............[OKAY]fused_adamfused_adam [YES] ............. ............. ...... [YES]fused_lamb[OKAY][YES] ...................fused_lamb ...... [YES] .............[OKAY] [OKAY] [YES]...... ......[OKAY] fused_lamb[OKAY]fused_lamb .......................... [YES] [YES]...... ......[OKAY] [OKAY]sparse_attn ............ [NO]sparse_attn ................... [NO][OKAY] ....... [OKAY] sparse_attntransformersparse_attn ............ transformer............ ............ [YES]............ [NO] [NO] [YES] ............. ....... ......[OKAY][OKAY] [OKAY][OKAY] stochastic_transformertransformer transformer............. stochastic_transformer [YES]............ [YES] ....... [YES] ...... [OKAY][YES] ...... [OKAY] ...... [OKAY] [OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [YES][OKAY] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- op nameop nameop name ................op name................................ ................ installedinstalled installed installed .. ..compatible.... --------------------------------------------------compatiblecompatible compatible -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adamcpu_adam ..............................[OKAY]............... [YES] [YES] [YES] ...... ...... ...... [OKAY] fused_adam[OKAY][OKAY] ............. [YES] ...... [OKAY] fused_adam fused_adam.............fused_adam fused_lamb ............. [YES] .......................... [YES] ......[YES][YES] ...... [OKAY]...... ...... [OKAY] [OKAY][OKAY] fused_lamb fused_lamb.............fused_lamb .............[YES]............. [YES]......[YES] sparse_attn......[OKAY]...... [OKAY]............[OKAY] [NO] ....... [OKAY] transformer ............ [YES] ...... sparse_attn[OKAY]sparse_attn sparse_attn........................ [NO]stochastic_transformer [NO]................... .[NO][OKAY]....... [YES] transformer.......[OKAY]...... [OKAY]............[OKAY] transformer[YES] ............transformer...... [YES] ............ [OKAY] ...... [YES] [OKAY] stochastic_transformer...... .[OKAY] stochastic_transformer[YES] ....... stochastic_transformer [YES] [OKAY] ....... [YES][OKAY] ...... [OKAY] ninjaninjaninja ninja...................................................... ..................[OKAY] [OKAY][OKAY] --------------------------------------------------[OKAY] ----------------------------------------------------------------------------------------------------op name-------------------------------------------------- ................op name op nameop nameinstalled ................ .................. ................ installedinstalledcompatible installed.... -------------------------------------------------- .. compatiblecompatible compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ............... [YES] cpu_adam...... cpu_adamcpu_adam............... [OKAY] [YES]  [WARNING]  async_io: please install the libaio-devel package with yum .............................. ......[YES][YES] [OKAY]fused_adam ............ .............[OKAY] [OKAY] [YES]  [WARNING]  async_io: please install the libaio-devel package with yum fused_adam...... .............[OKAY] [YES] fused_adam......fused_adam fused_lamb[OKAY] ............. ............. ............. [YES] [YES][YES]fused_lamb ............ ................... [OKAY] [OKAY][YES][OKAY] ...... [OKAY]fused_lamb fused_lamb .......................... [YES][YES] ......sparse_attn ..................[OKAY] [NO][OKAY] sparse_attn ....... ............[OKAY] [NO] ....... transformer[OKAY] ............ [YES]transformer sparse_attn......sparse_attn ............[OKAY] ........................ [YES] [NO]......[NO] stochastic_transformer ....... .......[OKAY] .  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [OKAY][OKAY] stochastic_transformer [YES] .......transformertransformer [OKAY] [YES]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.async_io ........................ ......[YES][YES] [OKAY]............ ............... [NO] ....... [NO] [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] async_io ............... transformer_inference[NO] ......... [NO][NO] ....... [OKAY] utils .................. transformer_inference[YES] ........ [NO][OKAY] ....... [OKAY] quantizer ..............utils [NO].................. .......[YES] ......[OKAY] [OKAY] --------------------------------------------------quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------JIT compiled ops requires ninja JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum ------------------------------------------------------------------------------------------------------------------------------------------------------ DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_io .............................. [NO][NO] .............. [NO][NO] transformer_inferencetransformer_inference .... [NO][NO] .............. [OKAY][OKAY] utilsutils .................................... [YES][YES] ............ [OKAY][OKAY] quantizerquantizer ............................ [NO][NO] .............. [OKAY][OKAY] ----------------------------------------------------------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils ..................  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.[YES]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io async_io...............-------------------------------------------------- [NO] ...................... [NO][NO] ....... [NO] transformer_inference .. transformer_inference[NO] ......... [OKAY][NO] ....... [OKAY] utils .................. [YES] utils...... ..................[OKAY] [YES] ......quantizer [OKAY].............. [NO] ....... quantizer[OKAY] .............. [NO] --------------------------------------------------....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY][OKAY] ---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- op name op name op name op name ................................................ ................installedinstalledinstalled ....installed.. compatiblecompatiblecompatible .. ------------------------------------------------------------------------------------------------------------------------------------------------------ compatible -------------------------------------------------- cpu_adamcpu_adamcpu_adam ..............................cpu_adam............... [YES][YES][YES] ........................... ......[OKAY][OKAY] [YES] [OKAY] ...... [OKAY] fused_adam .............fused_adamfused_adam ............. [YES]fused_adam ............. [YES]............. ...... [YES] ...... [YES] [OKAY] ............[OKAY] fused_lamb [OKAY] [OKAY]............. [YES]fused_lambfused_lamb fused_lamb ................... .............[YES][OKAY] ............. [YES]......[YES] ......[OKAY]...... [OKAY][OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............transformersparse_attn sparse_attn ............ ............ ............ [NO][YES] [NO].......[NO]...... [OKAY]..............[OKAY] [OKAY]transformer[OKAY] stochastic_transformertransformer............ transformer ............. [YES][YES]............ [YES]............ [OKAY][YES][OKAY] ...... ...... [OKAY] [OKAY]stochastic_transformer . stochastic_transformer[YES] .stochastic_transformer...... [YES].[OKAY] [YES]...... ......[OKAY] [OKAY] ninjaninjaninjaninja .................................... .................. [OKAY][OKAY].................. [OKAY]--------------------------------------------------[OKAY]-------------------------------------------------- op name --------------------------------------------------................--------------------------------------------------op name ................ op nameinstalled op name installed ................ .................... installedcompatiblecompatibleinstalled .. ---------------------------------------------------------------------------------------------------- .. compatible compatible-------------------------------------------------- --------------------------------------------------cpu_adam ............... cpu_adam[YES] ...............cpu_adam...... cpu_adam[YES]...............[OKAY] .....................[YES] [YES] [OKAY] ...... ......fused_adam [OKAY].............[OKAY] [YES]fused_adam ................... [OKAY][YES] ......fused_adam fused_adamfused_lamb[OKAY] ............. .......................... [YES]fused_lamb[YES][YES] ............................... [YES][OKAY][OKAY][OKAY] ...... [OKAY] fused_lamb fused_lamb............. .............[YES] [YES]...... ......[OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]transformer sparse_attnsparse_attn............transformer ............[YES]........................ ......[NO][YES] [NO] [OKAY].................... [OKAY] [OKAY][OKAY]stochastic_transformer .stochastic_transformer transformertransformer [YES]. ............ ............[YES] ...... [YES] ......[OKAY] [YES] [OKAY] ............ [OKAY][OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] .................... 1.8.2torch version .................... torch cuda version1.8.2 ............... 11.1torch cuda version nvcc version............... .....................11.1 11.2nvcc version deepspeed install path..................... ...........11.2 deepspeed install path['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] ...........deepspeed info ...................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 0.5.5+58a8e13, 58a8e13, masterdeepspeed info ...................deepspeed wheel compiled w. 0.5.5+58a8e13, 58a8e13, master...... torch 1.8, cuda 11.1deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO]-------------------------------------------------- ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninjaninjaninjaninja ...................................................... .................. [OKAY] [OKAY] [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- -------------------------------------------------- op name--------------------------------------------------op nameop name ................................ op name ................installed installed ................installed.... .. compatibleinstalled compatible compatible-------------------------------------------------- ..-------------------------------------------------- -------------------------------------------------- compatible -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam[YES]............... cpu_adam......[YES]............... [OKAY]............... ...... [YES] [YES][OKAY] ............ fused_adam[OKAY][OKAY] ............. fused_adam[YES] ................... [YES][OKAY] ......fused_adam fused_adam[OKAY]fused_lamb ............. .......................... [YES] [YES]fused_lamb [YES]...... ................... [OKAY] ......[YES][OKAY] [OKAY] ...... [OKAY] fused_lamb ............. fused_lamb[YES] .............sparse_attn...... [YES]............ [OKAY] [NO] sparse_attn ...... ................... [OKAY][NO][OKAY] .......transformer [OKAY]............ sparse_attn transformer[YES] .............................. [YES] sparse_attn [OKAY] [NO].................. [OKAY][NO].......stochastic_transformer .......[OKAY]. stochastic_transformer[OKAY][YES] .transformer...... [YES]transformer............ ...... [OKAY]............[YES] [OKAY][YES]...... ......[OKAY] [OKAY] stochastic_transformerstochastic_transformer .. [YES][YES] ............ [OKAY][OKAY] **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.--------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportJIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] transformer_inferenceutils .................... [YES] ...... [OKAY] [NO] ....... quantizer .............. [NO][OKAY] ....... [OKAY] utils .................. [YES] --------------------------------------------------...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................................... .................................... [OKAY][OKAY][OKAY][OKAY] -------------------------------------------------- ------------------------------------------------------------------------------------------------------------------------------------------------------ op name op nameop name................op name installed ................ ................................ .. installedinstalledcompatibleinstalled ...... -------------------------------------------------- compatible compatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam cpu_adam [YES]cpu_adam ................................................... [OKAY][YES][YES][YES] .................. [OKAY][OKAY][OKAY] fused_adam ............. [YES]fused_adamfused_adamfused_adam ...... .......................................[OKAY] [YES][YES][YES] ............ fused_lamb......[OKAY][OKAY] .............[OKAY]fused_lamb fused_lamb[YES]............. ...................[YES] fused_lamb [YES] [OKAY] ................... ...... [YES] [OKAY] [OKAY] ...... [OKAY] sparse_attn ............ [NO]sparse_attnsparse_attn sparse_attn............ ....... ........................ [NO] [OKAY][NO][NO]....... ..............[OKAY] transformer [OKAY][OKAY] ............transformer [YES]transformertransformer............ ............ ......[YES] ............ [YES] [OKAY]......[YES] [OKAY]............ [OKAY]stochastic_transformer[OKAY] stochastic_transformer. .stochastic_transformer[YES] stochastic_transformer .[YES]...... .[YES] [OKAY] ...... ......[YES] [OKAY][OKAY]...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... DeepSpeed general environment info:11.1 nvcc version ..................... 11.2 deepspeed install pathtorch install path ........... ...............['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']deepspeed wheel compiled w. ...... torch versiontorch 1.8, cuda 11.1 .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info:  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 async_io ............... [NO] ....... [NO] deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja JIT compiled ops requires ninjaJIT compiled ops requires ninja ninjaninjaninjaninja ........................................................................ [OKAY][OKAY][OKAY] [OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op nameop name op name................op name................ ................................installedinstalled installedinstalled.... ....compatiblecompatible compatiblecompatible---------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- cpu_adam ...............cpu_adam cpu_adamcpu_adam [YES] ............... .................................... [OKAY][YES] [YES] [YES]...... ......[OKAY] ......[OKAY]fused_adam .............[OKAY] [YES] ......fused_adam fused_adam[OKAY] .............fused_adam............. [YES]fused_lamb[YES] ............. ......................... [OKAY][YES][YES][OKAY] ............ fused_lamb[OKAY][OKAY]fused_lamb ............. .............[YES] [YES]...... fused_lamb......[OKAY] [OKAY]............. sparse_attn [YES]............ ......[NO] .......[OKAY] sparse_attn[OKAY] sparse_attn............ transformer ............ [NO] ............ [NO] ....... [YES] ....... [OKAY] ...... [OKAY] [OKAY] transformersparse_attntransformer stochastic_transformer ............ ........................ . [YES] [YES][NO][YES] ...... ...... ............. [OKAY][OKAY][OKAY] [OKAY] transformerstochastic_transformerstochastic_transformer .............. [YES][YES] ......[YES]...... [OKAY] [OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] async_io....... [OKAY]............... [NO] ....... [NO] utils .................. [YES] ...... [OKAY] transformer_inference ..quantizer [NO].............. .......[NO] [OKAY]....... [OKAY] utils --------------------------------------------------.................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_ioasync_ioasync_io ............................................. [NO][NO][NO] .............. .......[NO][NO] [NO] transformer_inference transformer_inference..transformer_inference ..[NO].. [NO].......[NO] .......[OKAY]....... [OKAY][OKAY] utils utils..................utils [YES].................................... ......[YES][YES] [OKAY]............ [OKAY][OKAY] quantizer .............. [NO]quantizer quantizer ................................... [OKAY][NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- -------------------------------------------------- -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.DeepSpeed C++/CUDA extension op report -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- ----------------------------------------------------------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] accumulate_allreduce_grads_in_fp32 .............. False utils .................. [YES] ...... [OKAY] adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 quantizer .............. [NO] ....... [OKAY] adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True -------------------------------------------------- bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.27447.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninjaninjaninjaninja .................................... .................. ..................[OKAY][OKAY][OKAY] [OKAY]------------------------------------------------------------------------------------------------------------------------------------------------------ op name--------------------------------------------------op nameop name ................................................ op nameinstalledinstalledinstalled ................ .. .. .. installedcompatible compatible compatible --------------------------------------------------.. -------------------------------------------------- compatible-------------------------------------------------- --------------------------------------------------cpu_adam ...............cpu_adam [YES]............... ......cpu_adam[YES] cpu_adam[OKAY]...... ............... [OKAY] ............... [YES][YES] ............ [OKAY][OKAY]fused_adam fused_adam .......................... [YES][YES] ............fused_adam fused_adam [OKAY] [OKAY]............. ............. [YES][YES] fused_lambfused_lamb ...... ................................ [OKAY] [YES][OKAY] [YES] ...... fused_lamb...... fused_lamb[OKAY] ............. [OKAY]............. [YES] [YES]...... ......[OKAY] [OKAY] sparse_attn ............ [NO]sparse_attn ................... [OKAY][NO] ....... [OKAY]sparse_attntransformer sparse_attn ........................ [YES]............transformer [NO] ......[NO] ............ [OKAY].............. [YES] [OKAY] [OKAY] ......stochastic_transformer [OKAY].transformer transformer [YES]........................stochastic_transformer ...... [YES].[YES] [OKAY] ............ [YES] [OKAY] [OKAY] ...... [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ...  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report-------------------------------------------------- ----------------------------------------------------------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninjaninjaninjaninja .................. ...................................................... [OKAY][OKAY][OKAY][OKAY] ------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------------------- op name op nameop nameop name ................ ................ installed................ ................installed .. installedinstalledcompatible.. .. .. -------------------------------------------------- compatiblecompatible compatible-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] cpu_adam......cpu_adam cpu_adam............... [OKAY] ............... [YES] ............... [YES] ...... [YES] ......[OKAY] fused_adam ...... [OKAY] [OKAY]............. [YES] ...... [OKAY]fused_adam ............. fused_adamfused_adam[YES] fused_lamb ................................ .............[YES][YES][OKAY] [YES] ...... ............fused_lamb [OKAY][OKAY] [OKAY] ............. [YES]fused_lambfused_lamb ...... ............. ............. [OKAY] [YES][YES] ............sparse_attn [OKAY][OKAY]............ [NO] ....... [OKAY] sparse_attn ............transformer [NO]............ .......[YES]sparse_attn sparse_attn...... [OKAY] ........................[OKAY] [NO]transformer [NO]................... .......[OKAY][YES] stochastic_transformer ......[OKAY]. [OKAY]transformer [YES]transformer ............stochastic_transformer .................. [YES] .[OKAY] [YES] ......[YES] ......[OKAY]...... [OKAY] [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... [OKAY]...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. > setting tensorboard ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-27 12:50:56,551] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.360 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 16.500 seconds time to initialize megatron (seconds): 49.818 [after megatron is initialized] datetime: 2021-11-27 12:51:13 building GPT model ... [2021-11-27 12:51:13,871] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-27 12:51:13,879] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-27 12:51:13,880] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.48 GB, percent = 21.1% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-27 12:51:15,185] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-27 12:51:15,763] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-27 12:51:15,764] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-27 12:51:15,764] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 39.7 GB, percent = 21.2% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-27 12:51:15,784] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-27 12:51:16,089] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-27 12:51:16,089] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-27 12:51:16,089] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-27 12:51:16,093] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-27 12:51:16,093] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-27 12:51:16,093] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-27 12:51:16,093] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-27 12:51:16,093] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-27 12:51:16,093] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-27 12:51:16,093] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-27 12:51:19,388] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-27 12:51:19,389] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-27 12:51:19,389] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.46 GB, percent = 22.1% [2021-11-27 12:51:19,425] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-27 12:51:19,425] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-27 12:51:19,425] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.57 GB, percent = 22.2% [2021-11-27 12:51:19,426] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-27 12:51:19,456] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-27 12:51:19,457] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-27 12:51:19,457] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.64 GB, percent = 22.2% [2021-11-27 12:51:19,457] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-27 12:51:19,457] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-27 12:51:19,457] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-27 12:51:19,457] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-27 12:51:19,457] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-27 12:51:19,457] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] amp_params ................... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] dump_state ................... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-27 12:51:19,458] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] pld_params ................... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-27 12:51:19,459] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-27 12:51:19,459] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-27 12:51:19,460] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-27 12:51:19,500] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-27 12:51:19,500] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 0 successfully loaded 32 ZeRO state_dicts for rank 1 successfully loaded 32 ZeRO state_dicts for rank 3 successfully loaded 32 ZeRO state_dicts for rank 30successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 10successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 25successfully loaded 32 ZeRO state_dicts for rank 27 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 1 successfully loaded 32 ZeRO state_dicts for rank 44successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 47 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 28 successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 21 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 8 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 14 loading 32 zero partition checkpoints for rank 62 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 57successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 52successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 54 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 44 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 18 successfully loaded 32 ZeRO state_dicts for rank 16 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 0 checkpoint version 3.0 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 49successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 51 loading 32 zero partition checkpoints for rank 15 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 35successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 34 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 58 successfully loaded 32 ZeRO state_dicts for rank 42successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 43successfully loaded 32 ZeRO state_dicts for rank 41 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 13 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 36 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 38 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 108886 time (ms) | load-checkpoint: 31385.15 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.208598528 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-27 12:51:50 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 7.147451 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.122 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.111 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.060 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-27 12:52:09 done with setup ... training ... time (ms) | model-and-optimizer-setup: 37100.23 | train/valid/test-data-iterators-setup: 18598.23 Number of parameters: 1.423040512 billion Number of parameters: 1.42303232 billion Number of parameters without embeddings: 1.20860672 billion Number of parameters without embeddings: 1.208598528 billion [before the start of training step] datetime: 2021-11-27 12:52:09 [2021-11-27 12:52:09,778] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-27 12:52:09,778] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-27 12:52:09,778] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-27 12:52:09,778] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-27 12:52:09,778] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 32] (after 109000 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 [Rank 0] (after 109000 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 iteration 109000/ 152972 | consumed samples: 50728384 | consumed tokens: 103891730432 | elapsed time per iteration (ms): 4742.9 | learning rate: 5.151E-05 | global batch size: 512 | lm loss: 1.370411E+00 | loss scale: 262144.0 | grad norm: 25719.122 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 109000 | lm loss value: 1.468052E+00 | lm loss PPL: 4.340770E+00 | -------------------------------------------------------------------------------------------- iteration 109200/ 152972 | consumed samples: 50830784 | consumed tokens: 104101445632 | elapsed time per iteration (ms): 5239.3 | learning rate: 5.117E-05 | global batch size: 512 | lm loss: 1.468671E+00 | loss scale: 262144.0 | grad norm: 31653.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 109400/ 152972 | consumed samples: 50933184 | consumed tokens: 104311160832 | elapsed time per iteration (ms): 4673.8 | learning rate: 5.082E-05 | global batch size: 512 | lm loss: 1.366963E+00 | loss scale: 262144.0 | grad norm: 26420.507 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 109500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 13:42:05,071] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/mp_rank_00_model_states.pt [2021-11-27 13:42:05,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,507] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,568] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,571] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,572] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,580] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,588] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,592] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 13:42:05,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 13:42:05,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step109500/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 109500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2949.68 iteration 109600/ 152972 | consumed samples: 51035584 | consumed tokens: 104520876032 | elapsed time per iteration (ms): 4711.1 | learning rate: 5.048E-05 | global batch size: 512 | lm loss: 1.438431E+00 | loss scale: 131072.0 | grad norm: 11984.866 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 109800/ 152972 | consumed samples: 51137984 | consumed tokens: 104730591232 | elapsed time per iteration (ms): 4680.0 | learning rate: 5.014E-05 | global batch size: 512 | lm loss: 1.444820E+00 | loss scale: 131072.0 | grad norm: 13728.569 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 14:21:06,335] [INFO] [logging.py:68:log_dist] [Rank 0] step=110000, skipped=231, lr=[4.980050077167732e-05, 4.980050077167732e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 110000 loss: 1.3252 iter time (s): 0.002 samples/sec: 219065.935 iteration 110000/ 152972 | consumed samples: 51240384 | consumed tokens: 104940306432 | elapsed time per iteration (ms): 4676.6 | learning rate: 4.980E-05 | global batch size: 512 | lm loss: 1.429600E+00 | loss scale: 262144.0 | grad norm: 27108.773 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 110000 | lm loss value: 1.425428E+00 | lm loss PPL: 4.159636E+00 | -------------------------------------------------------------------------------------------- iteration 110200/ 152972 | consumed samples: 51342784 | consumed tokens: 105150021632 | elapsed time per iteration (ms): 5219.8 | learning rate: 4.946E-05 | global batch size: 512 | lm loss: 1.478678E+00 | loss scale: 262144.0 | grad norm: 33121.406 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 110400/ 152972 | consumed samples: 51445184 | consumed tokens: 105359736832 | elapsed time per iteration (ms): 4678.0 | learning rate: 4.912E-05 | global batch size: 512 | lm loss: 1.437974E+00 | loss scale: 262144.0 | grad norm: 43326.736 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 110600/ 152972 | consumed samples: 51547584 | consumed tokens: 105569452032 | elapsed time per iteration (ms): 4677.7 | learning rate: 4.879E-05 | global batch size: 512 | lm loss: 1.462623E+00 | loss scale: 65536.0 | grad norm: 8592.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 110800/ 152972 | consumed samples: 51649984 | consumed tokens: 105779167232 | elapsed time per iteration (ms): 4673.7 | learning rate: 4.845E-05 | global batch size: 512 | lm loss: 1.413447E+00 | loss scale: 65536.0 | grad norm: 6726.956 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 111000/ 152972 | consumed samples: 51752384 | consumed tokens: 105988882432 | elapsed time per iteration (ms): 4689.2 | learning rate: 4.812E-05 | global batch size: 512 | lm loss: 1.446468E+00 | loss scale: 65536.0 | grad norm: 8382.582 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 111000 | lm loss value: 1.380306E+00 | lm loss PPL: 3.976120E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 111000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 15:42:44,191] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/mp_rank_00_model_states.pt [2021-11-27 15:42:44,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,676] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,678] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,681] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 15:42:44,717] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 15:42:44,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step111000/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 111000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2728.98 iteration 111200/ 152972 | consumed samples: 51854784 | consumed tokens: 106198597632 | elapsed time per iteration (ms): 5238.7 | learning rate: 4.778E-05 | global batch size: 512 | lm loss: 1.395385E+00 | loss scale: 131072.0 | grad norm: 12902.044 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 111400/ 152972 | consumed samples: 51957184 | consumed tokens: 106408312832 | elapsed time per iteration (ms): 4688.4 | learning rate: 4.745E-05 | global batch size: 512 | lm loss: 1.448350E+00 | loss scale: 131072.0 | grad norm: 16174.280 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 111600/ 152972 | consumed samples: 52059584 | consumed tokens: 106618028032 | elapsed time per iteration (ms): 4678.5 | learning rate: 4.712E-05 | global batch size: 512 | lm loss: 1.415001E+00 | loss scale: 131072.0 | grad norm: 12982.231 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 111800/ 152972 | consumed samples: 52161984 | consumed tokens: 106827743232 | elapsed time per iteration (ms): 4722.6 | learning rate: 4.679E-05 | global batch size: 512 | lm loss: 1.426406E+00 | loss scale: 131072.0 | grad norm: 13131.432 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 17:00:58,841] [INFO] [logging.py:68:log_dist] [Rank 0] step=112000, skipped=236, lr=[4.645883451229533e-05, 4.645883451229533e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 112000 loss: 0.9619 iter time (s): 0.002 samples/sec: 218130.578 iteration 112000/ 152972 | consumed samples: 52264384 | consumed tokens: 107037458432 | elapsed time per iteration (ms): 4695.9 | learning rate: 4.646E-05 | global batch size: 512 | lm loss: 1.436105E+00 | loss scale: 262144.0 | grad norm: 22587.900 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 112000 | lm loss value: 1.334992E+00 | lm loss PPL: 3.799966E+00 | -------------------------------------------------------------------------------------------- iteration 112200/ 152972 | consumed samples: 52366784 | consumed tokens: 107247173632 | elapsed time per iteration (ms): 5241.2 | learning rate: 4.613E-05 | global batch size: 512 | lm loss: 1.434371E+00 | loss scale: 131072.0 | grad norm: 11219.685 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 112400/ 152972 | consumed samples: 52469184 | consumed tokens: 107456888832 | elapsed time per iteration (ms): 4687.0 | learning rate: 4.580E-05 | global batch size: 512 | lm loss: 1.450243E+00 | loss scale: 131072.0 | grad norm: 16093.944 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 112500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 17:41:54,011] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/mp_rank_00_model_states.pt [2021-11-27 17:41:54,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,444] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,444] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,451] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,452] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,455] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,526] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 17:41:54,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 17:41:54,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step112500/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 112500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2742.94 iteration 112600/ 152972 | consumed samples: 52571584 | consumed tokens: 107666604032 | elapsed time per iteration (ms): 4685.6 | learning rate: 4.548E-05 | global batch size: 512 | lm loss: 1.426569E+00 | loss scale: 262144.0 | grad norm: 32316.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 112800/ 152972 | consumed samples: 52673984 | consumed tokens: 107876319232 | elapsed time per iteration (ms): 4671.6 | learning rate: 4.515E-05 | global batch size: 512 | lm loss: 1.377542E+00 | loss scale: 262144.0 | grad norm: 32867.198 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 113000/ 152972 | consumed samples: 52776384 | consumed tokens: 108086034432 | elapsed time per iteration (ms): 4671.3 | learning rate: 4.483E-05 | global batch size: 512 | lm loss: 1.439293E+00 | loss scale: 131072.0 | grad norm: 15000.385 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 113000 | lm loss value: 1.396972E+00 | lm loss PPL: 4.042938E+00 | -------------------------------------------------------------------------------------------- iteration 113200/ 152972 | consumed samples: 52878784 | consumed tokens: 108295749632 | elapsed time per iteration (ms): 5238.4 | learning rate: 4.451E-05 | global batch size: 512 | lm loss: 1.451035E+00 | loss scale: 131072.0 | grad norm: 12554.814 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 113400/ 152972 | consumed samples: 52981184 | consumed tokens: 108505464832 | elapsed time per iteration (ms): 4708.7 | learning rate: 4.419E-05 | global batch size: 512 | lm loss: 1.417217E+00 | loss scale: 262144.0 | grad norm: 27928.908 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 113600/ 152972 | consumed samples: 53083584 | consumed tokens: 108715180032 | elapsed time per iteration (ms): 4679.1 | learning rate: 4.387E-05 | global batch size: 512 | lm loss: 1.399882E+00 | loss scale: 131072.0 | grad norm: 16185.029 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 113800/ 152972 | consumed samples: 53185984 | consumed tokens: 108924895232 | elapsed time per iteration (ms): 4708.3 | learning rate: 4.355E-05 | global batch size: 512 | lm loss: 1.439634E+00 | loss scale: 131072.0 | grad norm: 11642.788 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 19:40:53,043] [INFO] [logging.py:68:log_dist] [Rank 0] step=114000, skipped=242, lr=[4.323167674379261e-05, 4.323167674379261e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 114000 loss: 2.1815 iter time (s): 0.002 samples/sec: 219719.699 iteration 114000/ 152972 | consumed samples: 53288384 | consumed tokens: 109134610432 | elapsed time per iteration (ms): 4679.7 | learning rate: 4.323E-05 | global batch size: 512 | lm loss: 1.479021E+00 | loss scale: 65536.0 | grad norm: 9583.500 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 114000 | lm loss value: 1.340873E+00 | lm loss PPL: 3.822381E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 114000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 19:42:43,846] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/mp_rank_00_model_states.pt [2021-11-27 19:42:44,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,279] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,282] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,283] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,284] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,290] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,291] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,293] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,296] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,305] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,316] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,318] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,322] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,332] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,334] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 19:42:44,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 19:42:44,391] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step114000/zero_pp_rank_2_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 114000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2664.99 iteration 114200/ 152972 | consumed samples: 53390784 | consumed tokens: 109344325632 | elapsed time per iteration (ms): 5231.1 | learning rate: 4.291E-05 | global batch size: 512 | lm loss: 1.456752E+00 | loss scale: 65536.0 | grad norm: 4867.962 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 114400/ 152972 | consumed samples: 53493184 | consumed tokens: 109554040832 | elapsed time per iteration (ms): 4676.1 | learning rate: 4.260E-05 | global batch size: 512 | lm loss: 1.480182E+00 | loss scale: 65536.0 | grad norm: 6945.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 114600/ 152972 | consumed samples: 53595584 | consumed tokens: 109763756032 | elapsed time per iteration (ms): 4681.6 | learning rate: 4.228E-05 | global batch size: 512 | lm loss: 1.492217E+00 | loss scale: 131072.0 | grad norm: 19946.797 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 114800/ 152972 | consumed samples: 53697984 | consumed tokens: 109973471232 | elapsed time per iteration (ms): 4686.0 | learning rate: 4.197E-05 | global batch size: 512 | lm loss: 1.413797E+00 | loss scale: 131072.0 | grad norm: 13496.443 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 115000/ 152972 | consumed samples: 53800384 | consumed tokens: 110183186432 | elapsed time per iteration (ms): 4665.8 | learning rate: 4.166E-05 | global batch size: 512 | lm loss: 1.461028E+00 | loss scale: 262144.0 | grad norm: 38792.112 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 115000 | lm loss value: 1.360332E+00 | lm loss PPL: 3.897488E+00 | -------------------------------------------------------------------------------------------- iteration 115200/ 152972 | consumed samples: 53902784 | consumed tokens: 110392901632 | elapsed time per iteration (ms): 5201.8 | learning rate: 4.135E-05 | global batch size: 512 | lm loss: 1.459456E+00 | loss scale: 262144.0 | grad norm: 27245.670 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 115400/ 152972 | consumed samples: 54005184 | consumed tokens: 110602616832 | elapsed time per iteration (ms): 4691.1 | learning rate: 4.104E-05 | global batch size: 512 | lm loss: 1.381001E+00 | loss scale: 262144.0 | grad norm: 26110.796 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 115500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 21:41:32,000] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/mp_rank_00_model_states.pt [2021-11-27 21:41:32,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,439] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,444] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,446] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,461] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,462] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,469] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,479] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,482] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,496] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 21:41:32,507] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 21:41:32,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step115500/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 115500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2688.01 iteration 115600/ 152972 | consumed samples: 54107584 | consumed tokens: 110812332032 | elapsed time per iteration (ms): 4704.3 | learning rate: 4.073E-05 | global batch size: 512 | lm loss: 1.433300E+00 | loss scale: 131072.0 | grad norm: 20387.087 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 115800/ 152972 | consumed samples: 54209984 | consumed tokens: 111022047232 | elapsed time per iteration (ms): 4668.5 | learning rate: 4.043E-05 | global batch size: 512 | lm loss: 1.409133E+00 | loss scale: 65536.0 | grad norm: 13626.067 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-27 22:20:27,500] [INFO] [logging.py:68:log_dist] [Rank 0] step=116000, skipped=245, lr=[4.011896030148179e-05, 4.011896030148179e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 116000 loss: 1.9507 iter time (s): 0.002 samples/sec: 220025.816 iteration 116000/ 152972 | consumed samples: 54312384 | consumed tokens: 111231762432 | elapsed time per iteration (ms): 4666.0 | learning rate: 4.012E-05 | global batch size: 512 | lm loss: 1.383732E+00 | loss scale: 65536.0 | grad norm: 9573.536 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 116000 | lm loss value: 1.384334E+00 | lm loss PPL: 3.992166E+00 | -------------------------------------------------------------------------------------------- iteration 116200/ 152972 | consumed samples: 54414784 | consumed tokens: 111441477632 | elapsed time per iteration (ms): 5240.1 | learning rate: 3.981E-05 | global batch size: 512 | lm loss: 1.427610E+00 | loss scale: 65536.0 | grad norm: 8221.163 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 116400/ 152972 | consumed samples: 54517184 | consumed tokens: 111651192832 | elapsed time per iteration (ms): 4706.4 | learning rate: 3.951E-05 | global batch size: 512 | lm loss: 1.413807E+00 | loss scale: 131072.0 | grad norm: 13548.991 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 116600/ 152972 | consumed samples: 54619584 | consumed tokens: 111860908032 | elapsed time per iteration (ms): 4685.2 | learning rate: 3.921E-05 | global batch size: 512 | lm loss: 1.488391E+00 | loss scale: 131072.0 | grad norm: 17752.852 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 116800/ 152972 | consumed samples: 54721984 | consumed tokens: 112070623232 | elapsed time per iteration (ms): 4711.4 | learning rate: 3.891E-05 | global batch size: 512 | lm loss: 1.426006E+00 | loss scale: 131072.0 | grad norm: 12139.455 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 117000/ 152972 | consumed samples: 54824384 | consumed tokens: 112280338432 | elapsed time per iteration (ms): 4668.5 | learning rate: 3.861E-05 | global batch size: 512 | lm loss: 1.394733E+00 | loss scale: 65536.0 | grad norm: 6850.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 117000 | lm loss value: 1.446448E+00 | lm loss PPL: 4.247998E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 117000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-27 23:42:26,048] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/mp_rank_00_model_states.pt [2021-11-27 23:42:26,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,475] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,480] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,490] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,491] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,497] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,500] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,501] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,507] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,516] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,517] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,528] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,536] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-27 23:42:26,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-27 23:42:26,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step117000/zero_pp_rank_2_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 117000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2825.13 iteration 117200/ 152972 | consumed samples: 54926784 | consumed tokens: 112490053632 | elapsed time per iteration (ms): 5256.0 | learning rate: 3.831E-05 | global batch size: 512 | lm loss: 1.431862E+00 | loss scale: 65536.0 | grad norm: 7639.563 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 117400/ 152972 | consumed samples: 55029184 | consumed tokens: 112699768832 | elapsed time per iteration (ms): 4670.6 | learning rate: 3.801E-05 | global batch size: 512 | lm loss: 1.476360E+00 | loss scale: 131072.0 | grad norm: 12466.604 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 117600/ 152972 | consumed samples: 55131584 | consumed tokens: 112909484032 | elapsed time per iteration (ms): 4673.1 | learning rate: 3.772E-05 | global batch size: 512 | lm loss: 1.461380E+00 | loss scale: 131072.0 | grad norm: 11676.233 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 117800/ 152972 | consumed samples: 55233984 | consumed tokens: 113119199232 | elapsed time per iteration (ms): 4659.0 | learning rate: 3.742E-05 | global batch size: 512 | lm loss: 1.414340E+00 | loss scale: 131072.0 | grad norm: 10460.473 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 01:00:15,280] [INFO] [logging.py:68:log_dist] [Rank 0] step=118000, skipped=247, lr=[3.713017967386922e-05, 3.713017967386922e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 118000 loss: 1.7561 iter time (s): 0.002 samples/sec: 219866.359 iteration 118000/ 152972 | consumed samples: 55336384 | consumed tokens: 113328914432 | elapsed time per iteration (ms): 4668.6 | learning rate: 3.713E-05 | global batch size: 512 | lm loss: 1.481292E+00 | loss scale: 262144.0 | grad norm: 32421.224 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 118000 | lm loss value: 1.381361E+00 | lm loss PPL: 3.980314E+00 | -------------------------------------------------------------------------------------------- iteration 118200/ 152972 | consumed samples: 55438784 | consumed tokens: 113538629632 | elapsed time per iteration (ms): 5223.3 | learning rate: 3.684E-05 | global batch size: 512 | lm loss: 1.430100E+00 | loss scale: 262144.0 | grad norm: 37079.798 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 118400/ 152972 | consumed samples: 55541184 | consumed tokens: 113748344832 | elapsed time per iteration (ms): 4677.2 | learning rate: 3.655E-05 | global batch size: 512 | lm loss: 1.441708E+00 | loss scale: 131072.0 | grad norm: 20927.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 118500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 01:41:05,323] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/mp_rank_00_model_states.pt [2021-11-28 01:41:05,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,750] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,755] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,757] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,758] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,761] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,764] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,766] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,769] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,770] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,771] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,774] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,775] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,788] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,790] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,791] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,792] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,793] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,797] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,797] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,801] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,802] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,803] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,805] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,810] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,813] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,815] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 01:41:05,819] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,823] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,868] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 01:41:05,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step118500/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 118500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2682.62 iteration 118600/ 152972 | consumed samples: 55643584 | consumed tokens: 113958060032 | elapsed time per iteration (ms): 4683.2 | learning rate: 3.626E-05 | global batch size: 512 | lm loss: 1.402925E+00 | loss scale: 131072.0 | grad norm: 10841.642 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 118800/ 152972 | consumed samples: 55745984 | consumed tokens: 114167775232 | elapsed time per iteration (ms): 4678.3 | learning rate: 3.597E-05 | global batch size: 512 | lm loss: 1.484250E+00 | loss scale: 65536.0 | grad norm: 7511.378 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 119000/ 152972 | consumed samples: 55848384 | consumed tokens: 114377490432 | elapsed time per iteration (ms): 4682.5 | learning rate: 3.569E-05 | global batch size: 512 | lm loss: 1.400547E+00 | loss scale: 65536.0 | grad norm: 5405.578 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 119000 | lm loss value: 1.403785E+00 | lm loss PPL: 4.070578E+00 | -------------------------------------------------------------------------------------------- iteration 119200/ 152972 | consumed samples: 55950784 | consumed tokens: 114587205632 | elapsed time per iteration (ms): 5220.4 | learning rate: 3.540E-05 | global batch size: 512 | lm loss: 1.386957E+00 | loss scale: 131072.0 | grad norm: 13448.340 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 119400/ 152972 | consumed samples: 56053184 | consumed tokens: 114796920832 | elapsed time per iteration (ms): 4680.6 | learning rate: 3.512E-05 | global batch size: 512 | lm loss: 1.387513E+00 | loss scale: 131072.0 | grad norm: 14227.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 119600/ 152972 | consumed samples: 56155584 | consumed tokens: 115006636032 | elapsed time per iteration (ms): 4672.1 | learning rate: 3.484E-05 | global batch size: 512 | lm loss: 1.414029E+00 | loss scale: 131072.0 | grad norm: 8971.717 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 119800/ 152972 | consumed samples: 56257984 | consumed tokens: 115216351232 | elapsed time per iteration (ms): 4670.5 | learning rate: 3.456E-05 | global batch size: 512 | lm loss: 1.419809E+00 | loss scale: 65536.0 | grad norm: 17114.835 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 03:39:47,736] [INFO] [logging.py:68:log_dist] [Rank 0] step=120000, skipped=253, lr=[3.4278292119444187e-05, 3.4278292119444187e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 120000 loss: 2.1450 iter time (s): 0.002 samples/sec: 219152.128 iteration 120000/ 152972 | consumed samples: 56360384 | consumed tokens: 115426066432 | elapsed time per iteration (ms): 4674.3 | learning rate: 3.428E-05 | global batch size: 512 | lm loss: 1.445544E+00 | loss scale: 65536.0 | grad norm: 10485.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 120000 | lm loss value: 1.389239E+00 | lm loss PPL: 4.011795E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 120000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 03:41:37,951] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/mp_rank_00_model_states.pt [2021-11-28 03:41:38,369] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,373] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,380] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,382] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,382] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,386] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,390] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,393] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,400] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,414] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,420] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,432] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 03:41:38,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,445] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 03:41:38,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step120000/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 120000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2776.13 iteration 120200/ 152972 | consumed samples: 56462784 | consumed tokens: 115635781632 | elapsed time per iteration (ms): 5227.2 | learning rate: 3.400E-05 | global batch size: 512 | lm loss: 1.408852E+00 | loss scale: 65536.0 | grad norm: 9496.115 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 120400/ 152972 | consumed samples: 56565184 | consumed tokens: 115845496832 | elapsed time per iteration (ms): 4685.0 | learning rate: 3.372E-05 | global batch size: 512 | lm loss: 1.451507E+00 | loss scale: 131072.0 | grad norm: 19398.304 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 120600/ 152972 | consumed samples: 56667584 | consumed tokens: 116055212032 | elapsed time per iteration (ms): 4682.0 | learning rate: 3.345E-05 | global batch size: 512 | lm loss: 1.441109E+00 | loss scale: 131072.0 | grad norm: 16023.217 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 120800/ 152972 | consumed samples: 56769984 | consumed tokens: 116264927232 | elapsed time per iteration (ms): 4691.4 | learning rate: 3.317E-05 | global batch size: 512 | lm loss: 1.384511E+00 | loss scale: 65536.0 | grad norm: 6151.033 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 121000/ 152972 | consumed samples: 56872384 | consumed tokens: 116474642432 | elapsed time per iteration (ms): 4675.6 | learning rate: 3.290E-05 | global batch size: 512 | lm loss: 1.451763E+00 | loss scale: 65536.0 | grad norm: 9596.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 121000 | lm loss value: 1.415734E+00 | lm loss PPL: 4.119510E+00 | -------------------------------------------------------------------------------------------- iteration 121200/ 152972 | consumed samples: 56974784 | consumed tokens: 116684357632 | elapsed time per iteration (ms): 5214.9 | learning rate: 3.263E-05 | global batch size: 512 | lm loss: 1.462357E+00 | loss scale: 65536.0 | grad norm: 5923.865 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 121400/ 152972 | consumed samples: 57077184 | consumed tokens: 116894072832 | elapsed time per iteration (ms): 4664.7 | learning rate: 3.236E-05 | global batch size: 512 | lm loss: 1.438424E+00 | loss scale: 131072.0 | grad norm: 13156.621 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 121500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 05:40:25,872] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/mp_rank_00_model_states.pt [2021-11-28 05:40:26,294] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,297] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,299] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,301] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,308] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,312] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,313] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,323] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,325] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,326] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,327] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,331] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,340] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,342] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,343] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,345] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,346] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,347] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,349] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,350] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,353] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,355] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,356] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,360] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,361] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,362] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,363] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,367] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 05:40:26,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 05:40:26,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step121500/zero_pp_rank_2_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 121500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2708.32 iteration 121600/ 152972 | consumed samples: 57179584 | consumed tokens: 117103788032 | elapsed time per iteration (ms): 4684.1 | learning rate: 3.209E-05 | global batch size: 512 | lm loss: 1.400382E+00 | loss scale: 131072.0 | grad norm: 15912.834 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 121800/ 152972 | consumed samples: 57281984 | consumed tokens: 117313503232 | elapsed time per iteration (ms): 4681.4 | learning rate: 3.183E-05 | global batch size: 512 | lm loss: 1.447576E+00 | loss scale: 65536.0 | grad norm: 8117.760 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 06:19:29,792] [INFO] [logging.py:68:log_dist] [Rank 0] step=122000, skipped=257, lr=[3.156002859140684e-05, 3.156002859140684e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 122000 loss: 1.1235 iter time (s): 0.002 samples/sec: 219599.381 iteration 122000/ 152972 | consumed samples: 57384384 | consumed tokens: 117523218432 | elapsed time per iteration (ms): 4703.9 | learning rate: 3.156E-05 | global batch size: 512 | lm loss: 1.395596E+00 | loss scale: 65536.0 | grad norm: 6105.851 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 122000 | lm loss value: 1.360188E+00 | lm loss PPL: 3.896924E+00 | -------------------------------------------------------------------------------------------- iteration 122200/ 152972 | consumed samples: 57486784 | consumed tokens: 117732933632 | elapsed time per iteration (ms): 5215.1 | learning rate: 3.130E-05 | global batch size: 512 | lm loss: 1.427091E+00 | loss scale: 131072.0 | grad norm: 19275.343 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 122400/ 152972 | consumed samples: 57589184 | consumed tokens: 117942648832 | elapsed time per iteration (ms): 4671.4 | learning rate: 3.103E-05 | global batch size: 512 | lm loss: 1.421398E+00 | loss scale: 131072.0 | grad norm: 17213.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 122600/ 152972 | consumed samples: 57691584 | consumed tokens: 118152364032 | elapsed time per iteration (ms): 4680.1 | learning rate: 3.077E-05 | global batch size: 512 | lm loss: 1.408277E+00 | loss scale: 131072.0 | grad norm: 14357.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 122800/ 152972 | consumed samples: 57793984 | consumed tokens: 118362079232 | elapsed time per iteration (ms): 4668.1 | learning rate: 3.051E-05 | global batch size: 512 | lm loss: 1.419674E+00 | loss scale: 262144.0 | grad norm: 33563.521 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 123000/ 152972 | consumed samples: 57896384 | consumed tokens: 118571794432 | elapsed time per iteration (ms): 4687.7 | learning rate: 3.025E-05 | global batch size: 512 | lm loss: 1.467755E+00 | loss scale: 131072.0 | grad norm: 15590.128 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 123000 | lm loss value: 1.440381E+00 | lm loss PPL: 4.222302E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 123000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 07:41:10,716] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/mp_rank_00_model_states.pt [2021-11-28 07:41:11,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,141] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,146] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,147] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,148] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,150] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,151] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,151] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,151] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,151] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,155] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,158] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,162] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,170] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,171] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,172] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,175] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,176] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,177] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,178] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,181] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,186] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,187] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,195] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,202] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,213] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 07:41:11,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,217] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,242] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 07:41:11,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123000/zero_pp_rank_2_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 123000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2717.47 iteration 123200/ 152972 | consumed samples: 57998784 | consumed tokens: 118781509632 | elapsed time per iteration (ms): 5260.5 | learning rate: 3.000E-05 | global batch size: 512 | lm loss: 1.389301E+00 | loss scale: 131072.0 | grad norm: 17588.781 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 123400/ 152972 | consumed samples: 58101184 | consumed tokens: 118991224832 | elapsed time per iteration (ms): 4675.0 | learning rate: 2.974E-05 | global batch size: 512 | lm loss: 1.412470E+00 | loss scale: 65536.0 | grad norm: 7366.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 123600/ 152972 | consumed samples: 58203584 | consumed tokens: 119200940032 | elapsed time per iteration (ms): 4665.3 | learning rate: 2.949E-05 | global batch size: 512 | lm loss: 1.411442E+00 | loss scale: 65536.0 | grad norm: 8067.167 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 123761 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 08:40:29,164] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/mp_rank_00_model_states.pt [2021-11-28 08:40:29,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,583] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,604] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,617] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,620] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,624] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,629] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,638] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,654] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 08:40:29,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 08:40:29,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step123761/zero_pp_rank_14_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 123761 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2803.04 [exiting program after 1190.0520441969236 minutes] datetime: 2021-11-28 08:40:29 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed ..ninja compatible .................. --------------------------------------------------[OKAY] -------------------------------------------------- op name ................ installedcpu_adam ..ninja............... compatible..................[YES] [OKAY]--------------------------------------------------...... --------------------------------------------------[OKAY] op name ................ installed ..cpu_adam fused_adam compatible ............... ............. --------------------------------------------------[YES][YES] ............ [OKAY][OKAY]cpu_adam ............... [YES] fused_lamb...... .............[OKAY] fused_adam[YES] ............. ......[YES] fused_adam[OKAY]...... .............[OKAY] [YES] ...... [OKAY] fused_lamb ............. fused_lamb[YES] sparse_attn ............. ...... ............[YES][OKAY] [NO]...... .......[OKAY] [OKAY] transformer ............ [YES] ...... [OKAY] sparse_attn ............ sparse_attnstochastic_transformer[NO] .................... [YES][OKAY] [NO]...... .......[OKAY]transformer [OKAY]............ [YES] ......transformer [OKAY] ............ [YES] ...... stochastic_transformer[OKAY] . [YES] ...... [OKAY]stochastic_transformer . [YES] ...... [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adam fused_adam............. .............[YES] [YES]...... ......[OKAY] ninja[OKAY] ninja fused_lamb.................. fused_lamb...............................[OKAY] [OKAY].............[YES] --------------------------------------------------[YES] ......-------------------------------------------------- op name...... [OKAY] [OKAY] op name................ ................installed installed.. ..compatiblesparse_attn sparse_attn compatible--------------------------------------------------............ ............ [NO]-------------------------------------------------- [NO]....... .......[OKAY] [OKAY] cpu_adam transformer...............transformercpu_adam ............[YES]........................... ......[YES][YES][YES] [OKAY].................. [OKAY][OKAY][OKAY] fused_adamstochastic_transformerstochastic_transformer .............fused_adam.. [YES][YES][YES]............. ............[YES]...... [OKAY][OKAY][OKAY]...... [OKAY] fused_lamb ............. fused_lamb[YES] ................... [OKAY][YES] ...... [OKAY] sparse_attn ............sparse_attn [NO]............ .......[NO] [OKAY]....... [OKAY] transformer ............transformer [YES]............ ...... [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninja .................. [OKAY] fused_adam --------------------------------------------------............. op name[YES] ...................... installed[OKAY] .. compatible -------------------------------------------------- fused_lamb ............. [YES] ...... [OKAY] cpu_adam ............... [YES] ...... [OKAY] sparse_attn fused_adam............ .............[NO] [YES] ............. [OKAY][OKAY] transformerfused_lamb ......................... [YES][YES] ............ [OKAY][OKAY] stochastic_transformer . [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja ninja.................. [OKAY].................. [OKAY]-------------------------------------------------- --------------------------------------------------op name ................op name installed................ ..installed compatible.. compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ...............cpu_adam [YES]............... ......[YES] [OKAY]...... [OKAY] fused_adam ............. [YES] ...... fused_adam[OKAY] ............. [YES] fused_lamb...... .............[OKAY] [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] sparse_attn ............ transformer[NO] ................... [YES][OKAY] ...... [OKAY] transformer ............ [YES]stochastic_transformer ....... [OKAY][YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- --------------------------------------------------DeepSpeed C++/CUDA extension op report --------------------------------------------------DeepSpeed C++/CUDA extension op report NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible --------------------------------------------------ninja .................. [OKAY] -------------------------------------------------- cpu_adamop name ............................... [YES]installed ........ [OKAY]compatible -------------------------------------------------- fused_adam ............. [YES] cpu_adam...... ...............[OKAY] [YES] ...... [OKAY]fused_lamb ninja ............. [YES].................. ......[OKAY]ninja [OKAY] fused_adam--------------------------------------------------.................. .............[OKAY] op name [YES] ................--------------------------------------------------...... installed[OKAY] op name .. ................compatiblefused_lamb sparse_attninstalled -------------------------------------------------- ............. .. ............ [YES][NO]compatible ............. -------------------------------------------------- [OKAY] [OKAY] cpu_adam ............... [YES]transformer .................. cpu_adam[OKAY][YES] ..................... [YES][OKAY] ...... sparse_attn[OKAY] stochastic_transformer............ fused_adam.[NO] .............[YES]....... [YES]......fused_adam [OKAY] ...... .............[OKAY] [OKAY][YES] transformer...... ............[OKAY] fused_lamb[YES] ................... fused_lamb[YES][OKAY] ................... [YES][OKAY] stochastic_transformer...... .[OKAY] [YES] ...... [OKAY] sparse_attnsparse_attn ........................ [NO][NO] .............. [OKAY][OKAY] transformertransformer ........................ [YES][YES] ............ [OKAY][OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] ninjafused_lamb ............................... [OKAY][YES] ninja......-------------------------------------------------- [OKAY]..................op name [OKAY]................ installed ..-------------------------------------------------- compatibleninja -------------------------------------------------- op name sparse_attn.................. ................ [OKAY] ............ installed [NO]--------------------------------------------------cpu_adam .. ......................op namecompatible ................[OKAY][YES]-------------------------------------------------- installed ........ [OKAY]transformer compatible ............-------------------------------------------------- [YES]cpu_adam ......fused_adam............... [OKAY].............[YES] cpu_adam[YES]...... stochastic_transformer ............... [OKAY] ....... [YES] [OKAY] ......[YES] [OKAY] ......fused_lambfused_adam .............[OKAY]............. [YES][YES] fused_adam...... ...................[OKAY] [YES] [OKAY]...... [OKAY] fused_lambfused_lamb .......................... sparse_attn [YES] [YES]............ ............[NO] [OKAY]....... [OKAY] [OKAY] transformer ............ [YES] ...... [OKAY] sparse_attn ............sparse_attnstochastic_transformer [NO]. ...................[YES] [OKAY]......[NO] [OKAY]....... transformer [OKAY]............ [YES] ...... [OKAY]transformer ............ [YES] ......stochastic_transformer [OKAY]. [YES] ......stochastic_transformer [OKAY] . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- ninjaninjacpu_adam .................. .................. ninja[OKAY]............... ..................[OKAY]-------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. [YES][OKAY] op name-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ...... op name--------------------------------------------------[OKAY]................ op name................installed ..................installed installedcompatiblefused_adam.. ..-------------------------------------------------- compatible............. compatible--------------------------------------------------[YES] --------------------------------------------------...... [OKAY] cpu_adam ............... [YES]cpu_adam fused_lamb ..................... cpu_adam ............. [YES][OKAY] ............... [YES] ...... [YES] ......[OKAY] ......[OKAY] fused_adam[OKAY] ............. fused_adam[YES] ................... [YES]fused_adam[OKAY] sparse_attn...... .........................[OKAY] [NO]fused_lamb[YES] fused_lamb....... ............. ...................[OKAY][YES] [YES] ......[OKAY]...... transformer[OKAY] [OKAY] ............fused_lamb [YES]............. ......[YES] [OKAY]...... [OKAY] sparse_attn stochastic_transformer............sparse_attn [NO]............. .......sparse_attn[YES][NO] [OKAY]......................... [NO] [OKAY][OKAY]....... transformer ............[OKAY]transformer [YES]............ ......transformer [YES][OKAY]............ ......[YES] [OKAY]stochastic_transformer...... .[OKAY] stochastic_transformer[YES] ....... stochastic_transformer[YES] [OKAY] . ...... [OKAY][YES] ...... [OKAY] ninja .................. [OKAY]ninja --------------------------------------------------.................. op name[OKAY] ................ installed-------------------------------------------------- ..ninja op namecompatible ninja..................................-------------------------------------------------- ..................installed[OKAY] ..[OKAY] --------------------------------------------------compatible --------------------------------------------------op name-------------------------------------------------- ................op name installedcpu_adam................ ..cpu_adaminstalled ................................ compatible [YES] [YES]compatible -------------------------------------------------- ............-------------------------------------------------- [OKAY] [OKAY]cpu_adam ............... [YES] cpu_adam...... fused_adam...............[OKAY] fused_adam.............[YES] .............[YES] ............fused_adam [OKAY] [YES].............[OKAY] [YES] ......fused_lamb...... [OKAY].............[OKAY] [YES] ...... fused_lambfused_adam[OKAY] fused_lamb............. .......................... [YES][YES] ...... ...... [YES][OKAY] [OKAY]......sparse_attn fused_lamb............[OKAY] [NO]............. .......[YES] [OKAY]sparse_attn ...... ............[OKAY] transformer[NO] ............ .......[YES] [OKAY]sparse_attn...... ............[OKAY]sparse_attn transformer [NO] ........................ [NO] stochastic_transformer[YES] ............... ...... [YES] [OKAY] [OKAY][OKAY] ...... transformer [OKAY]transformer ............stochastic_transformer............ .[YES][YES] [YES]............ ......[OKAY] [OKAY][OKAY] stochastic_transformer . [YES]stochastic_transformer ....... [YES][OKAY] ...... [OKAY] ninja ..................ninja [OKAY] .................. --------------------------------------------------[OKAY] op name --------------------------------------------------................ninja installed op name..................ninja .. ................ [OKAY]compatible .................. installed -------------------------------------------------- [OKAY]-------------------------------------------------- .. op name -------------------------------------------------- compatible ................ op name--------------------------------------------------installed ................cpu_adam.. installed...............compatible cpu_adam .. --------------------------------------------------............... [YES]compatible[YES] ............-------------------------------------------------- [OKAY]cpu_adam[OKAY] ............... [YES] cpu_adam...... fused_adam...............[OKAY] fused_adam ............. [YES] ............. [YES] ...... [YES] ......fused_adam[OKAY] ...... [OKAY] ............. [OKAY] [YES] ......fused_lamb fused_adamfused_lamb.............[OKAY] .............[YES]............. ......[YES]fused_lamb[YES] [OKAY]......................... [OKAY][YES][OKAY] ...... [OKAY] fused_lamb ............. [YES] ......sparse_attn [OKAY]............ sparse_attn[NO] sparse_attn................... ............[OKAY][NO] [NO]....... transformer sparse_attn....... [OKAY] ............ ............ [OKAY] [YES]transformer [NO] ..................transformer ....... [OKAY][YES] ............ [OKAY] ...... [YES] stochastic_transformer[OKAY]......transformer . ............ [OKAY] [YES] [YES]stochastic_transformer ......stochastic_transformer....... [OKAY].[OKAY] [YES] [YES] ...... ......[OKAY]stochastic_transformer [OKAY]. [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed ..ninja compatible ..................-------------------------------------------------- [OKAY] -------------------------------------------------- op name ................cpu_adam installed ................. [YES] compatible...... [OKAY]-------------------------------------------------- fused_adamcpu_adam ............................ [YES][YES] ............ [OKAY][OKAY] fused_lamb ............. [YES] ......fused_adam [OKAY]............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... sparse_attn[OKAY] ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY]sparse_attn ............ [NO] .......stochastic_transformer [OKAY]. [YES] transformer...... ............[OKAY] [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. --------------------------------------------------JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninjaDeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninjaninja .................. ..................[OKAY] [OKAY] -------------------------------------------------- -------------------------------------------------- op name op name................ ................installed installed.. ..compatible compatible-------------------------------------------------- -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] cpu_adamfused_adam ............. [YES] ...... ...............[OKAY] [YES] ......fused_lamb [OKAY]............. [YES] ...... [OKAY] sparse_attnfused_adamninja ............ ..................[NO]............. .......[YES][OKAY] [OKAY]...... [OKAY]ninja-------------------------------------------------- transformer ..................op name............ fused_lamb [OKAY].............[YES]................ [YES]...... --------------------------------------------------installed[OKAY]...... ..op name[OKAY] stochastic_transformercompatible ................. -------------------------------------------------- installed[YES] ........ [OKAY]compatible cpu_adam-------------------------------------------------- sparse_attn............... [YES] ...... ............[OKAY] [NO]cpu_adam ............... [YES]....... ......[OKAY]fused_adam [OKAY] ............. [YES] ...... [OKAY]transformer ............fused_adam fused_lamb [YES] .......................... [YES][YES] .................. [OKAY][OKAY][OKAY] stochastic_transformerfused_lamb ............. [YES] ....... sparse_attn [YES] [OKAY] ............ ...... [NO][OKAY] ....... [OKAY] sparse_attn ............ transformer[NO] ............ .......[YES] [OKAY]...... [OKAY] transformer ............ stochastic_transformer[YES] ....... [YES][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report -------------------------------------------------- --------------------------------------------------JIT compiled ops requires ninja NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] ninja .................. [OKAY] ninja-------------------------------------------------- sparse_attn op name .................. ............................ [OKAY][NO]installed ....... --------------------------------------------------..[OKAY] compatibleop name transformer-------------------------------------------------- ................ ............ [YES]installed ........cpu_adam [OKAY] compatible............... [YES]-------------------------------------------------- stochastic_transformer...... .[OKAY] [YES]cpu_adam ..................... [OKAY][YES] fused_adam...... .............[OKAY] [YES] ...... [OKAY] fused_adamfused_lamb .......................... [YES][YES] ............ [OKAY][OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformersparse_attn ........................ [YES][NO] ............. [OKAY][OKAY] transformer stochastic_transformer............ [YES]. ......[YES] [OKAY]...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op reportDeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- JIT compiled ops requires ninjaJIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report-------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ------------------------------------------------------------------------------------------------------------------------------------------------------ JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op reportNOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.JIT compiled ops requires ninja -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY]ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ sparse_attninstalledinstalled ................ [NO]compatiblecompatible .......---------------------------------------------------------------------------------------------------- ninja [OKAY] .................. [OKAY] transformer ............--------------------------------------------------cpu_adam cpu_adam [YES]op name .................................... ................[YES] [OKAY] installed[YES] ...... ...... .. stochastic_transformer[OKAY] compatible[OKAY] .-------------------------------------------------- [YES] ...... fused_adam[OKAY]fused_adam .............cpu_adam............. [YES]...............[YES] ............[YES] [OKAY][OKAY]...... [OKAY] fused_lambfused_lamb .......................... [YES][YES] ............fused_adam [OKAY][OKAY]............. [YES] ...... [OKAY] fused_lamb ............. [YES]sparse_attnsparse_attn ...... ........................[OKAY] [NO] [NO]....... .......[OKAY] [OKAY] transformer transformer............ ............sparse_attn[YES] [YES].................. ......[OKAY][NO] .......[OKAY] [OKAY] stochastic_transformer .stochastic_transformertransformer [YES]............. [YES] ...... [YES] ...... [OKAY] ......[OKAY] [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ninja............. [YES].................. ......[OKAY] [OKAY] -------------------------------------------------- fused_lambop name ............................. [YES]installed ........ [OKAY]compatible -------------------------------------------------- cpu_adam ............... [YES]sparse_attn .................. [OKAY][NO] ....... [OKAY] transformer ............fused_adam [YES]............. ......[YES] [OKAY]...... [OKAY] stochastic_transformer .fused_lamb [YES]............. ......[YES] [OKAY]...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninjaninja .................................... [OKAY][OKAY] ---------------------------------------------------------------------------------------------------- op nameop name ................................ installedinstalled .... compatiblecompatible ---------------------------------------------------------------------------------------------------- cpu_adamcpu_adam .............................. [YES][YES] ............ [OKAY][OKAY] fused_adamfused_adam .......................... [YES][YES] ............ [OKAY][OKAY] fused_lamb .............fused_lamb [YES]............. ......[YES] [OKAY]...... [OKAY] sparse_attn ............sparse_attn [NO]............ ....... [NO][OKAY] ....... [OKAY] transformer ............transformer [YES]............ ......[YES] [OKAY] ...... [OKAY] stochastic_transformer stochastic_transformer. .[YES] [YES]...... ......[OKAY] [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] ninjaninja ninja..................fused_adam.................. ..................[OKAY].............[OKAY] --------------------------------------------------[YES][OKAY] -------------------------------------------------- op name......-------------------------------------------------- op name ................ [OKAY] ................op name installed................ ..installedinstalled fused_lamb compatible ...............--------------------------------------------------.. compatiblecompatible[YES] ---------------------------------------------------------------------------------------------------- ...... [OKAY]cpu_adam ............... [YES] cpu_adam...... [OKAY]............... cpu_adam[YES]sparse_attn ................................. fused_adam[OKAY][NO] [YES] ............. ....... ...... [YES] [OKAY][OKAY]...... fused_adam [OKAY]transformer............. ............[YES] [YES]......fused_lamb [OKAY].............fused_adam...... [YES] [OKAY]................... fused_lamb[OKAY] stochastic_transformer[YES]............. .......[YES] [YES][OKAY]...... ......[OKAY] [OKAY]sparse_attn ............ [NO] .......fused_lamb [OKAY] sparse_attn............. transformer[YES]............ ............[NO] ......[YES] ....... [OKAY]...... [OKAY][OKAY] transformerstochastic_transformer ............. [YES][YES] ............ sparse_attn[OKAY] [OKAY] ............ [NO] stochastic_transformer....... [OKAY]. [YES]transformer ...... [OKAY] ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] ninjasparse_attn .............................. [OKAY][NO] .......-------------------------------------------------- [OKAY]op name ................ transformerinstalled .............. compatible[YES] --------------------------------------------------...... [OKAY] ninja stochastic_transformer..................cpu_adam .[OKAY]............... ninja [YES] [YES] ..................-------------------------------------------------- ...... ...... [OKAY] op name[OKAY] [OKAY] --------------------------------------------------................ op nameinstalled ................ ..installed fused_adam compatible .. .............compatible-------------------------------------------------- [YES]-------------------------------------------------- ...... [OKAY] cpu_adamfused_lamb cpu_adam ............. ............... ............... [YES] [YES][YES]...... ...... ......[OKAY][OKAY] [OKAY] fused_adamfused_adam .............sparse_attn............. [YES]............[YES] ......[NO]...... [OKAY] ....... [OKAY] [OKAY] fused_lamb transformerfused_lamb............. ............ .............[YES][YES] ............[YES] [OKAY][OKAY] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] sparse_attn sparse_attn............ [NO]............ ....... [NO][OKAY] ....... transformer[OKAY] ............ [YES] ......transformer [OKAY] ............ [YES] stochastic_transformer...... . [OKAY][YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  async_io: please install the libaio-devel package with yum async_io ............... async_io[NO] ...................... [NO][NO] ....... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. transformer_inference[NO] ......... [NO] [OKAY]....... [OKAY] async_io utils............... ..................[NO] utils [YES] ....... ........................[NO] [OKAY][YES] ...... [OKAY]quantizer .............. [NO] quantizer....... transformer_inference..............[OKAY] ..[NO] [NO]....... .......[OKAY]-------------------------------------------------- [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info:  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 transformer_inference .. [NO] ....... [OKAY] deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info:  [WARNING]  async_io: please install the libaio-devel package with yum torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] .......async_io [NO]............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]transformer_inference .. [NO] utils....... ..................[OKAY] [YES] ...... [OKAY] utils ..................quantizer [YES].............. ......[NO] [OKAY]....... [OKAY] quantizer ..............-------------------------------------------------- [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found........ [NO] transformer_inference .. [NO] async_io....... ...............[OKAY] [NO] ....... [NO]utils .................. [YES] ...... [OKAY] quantizer ..............transformer_inference [NO].. .......[NO] [OKAY]....... [OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info:  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 async_io ............... [NO] ....... [NO] torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install pathDeepSpeed general environment info: .............................. torch install path ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']............... torch versiontorch version ........................................ ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']1.8.21.8.2 torch cuda versiontorch cuda versiontorch version .................................................. 11.11.8.211.1 nvcc versionnvcc version torch cuda version ..................... ..................... ............... 11.2 11.2 11.1 deepspeed install path deepspeed install pathnvcc version........... ................................ ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']11.2 ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info deepspeed install path deepspeed info ................... ........... ................... 0.5.5+58a8e13, 58a8e13, master 0.5.5+58a8e13, 58a8e13, master['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed wheel compiled w. deepspeed infodeepspeed wheel compiled w....... .........................torch 1.8, cuda 11.1 0.5.5+58a8e13, 58a8e13, mastertorch 1.8, cuda 11.1 deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report ---------------------------------------------------------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. DeepSpeed C++/CUDA extension op report-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] async_io ............... [NO] ....... [NO] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. transformer_inference .. [NO] ....... [OKAY] utils .................. async_io[YES] ..................... [NO][OKAY] ....... [NO] quantizer .............. [NO] ....... [OKAY] transformer_inference-------------------------------------------------- .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- ninja ..................cpu_adam [OKAY]............... [YES]-------------------------------------------------- ...... [OKAY]op name ................ installed .. compatible --------------------------------------------------fused_adam ............. [YES] ...... [OKAY] cpu_adam ............... fused_lamb[YES] ................... [YES][OKAY]ninja ...... .................. [OKAY][OKAY] fused_adam --------------------------------------------------............. [YES]op name ...................... [OKAY]installed sparse_attn.. compatible............fused_lambninja [NO]--------------------------------------------------............. .......[YES] .................. [OKAY] ......[OKAY] [OKAY]cpu_adamtransformer ...............--------------------------------------------------............ [YES]op name [YES]...... [OKAY]...... sparse_attn................ [OKAY]............installed [NO].. fused_adamstochastic_transformer ..................... compatible[YES][OKAY][YES] ............ transformer [OKAY]-------------------------------------------------- [OKAY] ............ [YES]fused_lamb ................... [OKAY][YES] cpu_adam...... [OKAY]stochastic_transformer ................ [YES][YES] ............sparse_attn [OKAY][OKAY] ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] fused_adam stochastic_transformer............. .[YES] [YES] ...... [OKAY] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info:  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 async_io ............... [NO] ....... [NO] nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  async_io: please install the libaio-devel package with yum transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] async_io....... ...............[NO] [NO] ....... [NO]transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ......transformer_inference [OKAY].. quantizer .............. [NO][NO] .............. [OKAY][OKAY] -------------------------------------------------- utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. -------------------------------------------------- async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install pathDeepSpeed general environment info: ............... torch install path ...............['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']1.8.2 torch cuda versiontorch version ................................... 1.8.211.1 nvcc versiontorch cuda version .................................... 11.211.1 deepspeed install pathnvcc version ................................ 11.2['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed install pathdeepspeed info .............................. 0.5.5+58a8e13, 58a8e13, master['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed wheel compiled w.deepspeed info ......................... torch 1.8, cuda 11.10.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 DeepSpeed general environment info:torch cuda version ............... 11.1 nvcc version ..................... torch install path11.2 ...............deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] ................... 0.5.5+58a8e13, 58a8e13, master torch version deepspeed wheel compiled w..................... ......1.8.2 torch 1.8, cuda 11.1 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.30245.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO]  [WARNING]  async_io: please install the libaio-devel package with yum transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > setting tensorboard ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-28 08:40:55,823] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' make: Nothing to be done for 'default'. make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 0.364 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 22.798 seconds time to initialize megatron (seconds): 30.988 [after megatron is initialized] datetime: 2021-11-28 08:41:18 building GPT model ... [2021-11-28 08:41:19,026] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-28 08:41:19,027] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-28 08:41:19,027] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.02 GB, percent = 21.4% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-28 08:41:20,317] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-28 08:41:20,855] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-28 08:41:20,855] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-28 08:41:20,855] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.24 GB, percent = 21.5% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-28 08:41:20,874] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-28 08:41:21,182] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-28 08:41:21,182] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-28 08:41:21,182] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-28 08:41:21,186] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-28 08:41:21,186] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-28 08:41:21,186] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-28 08:41:21,186] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-28 08:41:21,186] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-28 08:41:21,186] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-28 08:41:21,186] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-28 08:41:23,824] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-28 08:41:23,825] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-28 08:41:23,825] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 41.96 GB, percent = 22.4% [2021-11-28 08:41:23,857] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-28 08:41:23,858] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-28 08:41:23,858] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.08 GB, percent = 22.5% [2021-11-28 08:41:23,858] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-28 08:41:23,886] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-28 08:41:23,887] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-28 08:41:23,887] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.17 GB, percent = 22.5% [2021-11-28 08:41:23,887] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-28 08:41:23,887] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-28 08:41:23,887] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-28 08:41:23,887] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-28 08:41:23,887] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] amp_params ................... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] dump_state ................... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-28 08:41:23,888] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] pld_params ................... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-28 08:41:23,889] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-28 08:41:23,890] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-28 08:41:23,890] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-28 08:41:23,890] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-28 08:41:23,890] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-28 08:41:23,918] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-28 08:41:23,919] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 52 successfully loaded 32 ZeRO state_dicts for rank 54successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 59 successfully loaded 32 ZeRO state_dicts for rank 33successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 50successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 42 successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 43 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 49 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 47successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 57 successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 36successfully loaded 32 ZeRO state_dicts for rank 38 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 4successfully loaded 32 ZeRO state_dicts for rank 5 successfully loaded 32 ZeRO state_dicts for rank 6 successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 30 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 28 successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 9 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 12 successfully loaded 32 ZeRO state_dicts for rank 0successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 19successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 18 successfully loaded 32 ZeRO state_dicts for rank 26 successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 3 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 52 successfully loaded 32 ZeRO state_dicts for rank 15successfully loaded 32 ZeRO state_dicts for rank 14 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 43 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 41 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 0 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 18 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 10 successfully loaded 32 ZeRO state_dicts for rank 1 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 1 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 123761 time (ms) | load-checkpoint: 17890.77 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-28 08:41:41 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 3.762906 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.063 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.205 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.073 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-28 08:41:51 done with setup ... training ... time (ms) | model-and-optimizer-setup: 22854.36 | train/valid/test-data-iterators-setup: 9159.40 Number of parameters: 1.423040512 billion Number of parameters: 1.42303232 billion Number of parameters without embeddings: 1.20860672 billion Number of parameters without embeddings: 1.208598528 billion [before the start of training step] datetime: 2021-11-28 08:41:51 [2021-11-28 08:41:51,405] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-28 08:41:51,405] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-28 08:41:51,405] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-28 08:41:51,405] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-28 08:41:51,405] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 32] (after 123800 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 iteration 123800/ 152972 | consumed samples: 58305984 | consumed tokens: 119410655232 | elapsed time per iteration (ms): 4708.2 | learning rate: 2.923E-05 | global batch size: 512 | lm loss: 1.412080E+00 | loss scale: 131072.0 | grad norm: 16447.253 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [Rank 0] (after 123800 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 [2021-11-28 09:00:22,288] [INFO] [logging.py:68:log_dist] [Rank 0] step=124000, skipped=261, lr=[2.8983564637762527e-05, 2.8983564637762527e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 124000 loss: 2.3068 iter time (s): 0.002 samples/sec: 221210.647 iteration 124000/ 152972 | consumed samples: 58408384 | consumed tokens: 119620370432 | elapsed time per iteration (ms): 4637.4 | learning rate: 2.898E-05 | global batch size: 512 | lm loss: 1.419161E+00 | loss scale: 131072.0 | grad norm: 21806.813 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 124000 | lm loss value: 1.402027E+00 | lm loss PPL: 4.063429E+00 | -------------------------------------------------------------------------------------------- iteration 124200/ 152972 | consumed samples: 58510784 | consumed tokens: 119830085632 | elapsed time per iteration (ms): 5196.9 | learning rate: 2.873E-05 | global batch size: 512 | lm loss: 1.413318E+00 | loss scale: 65536.0 | grad norm: 8896.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 124400/ 152972 | consumed samples: 58613184 | consumed tokens: 120039800832 | elapsed time per iteration (ms): 4640.5 | learning rate: 2.849E-05 | global batch size: 512 | lm loss: 1.434154E+00 | loss scale: 65536.0 | grad norm: 6588.671 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 124500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 09:40:56,636] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/mp_rank_00_model_states.pt [2021-11-28 09:40:57,069] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,070] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,073] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,083] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,087] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,088] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,090] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,091] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,093] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,094] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,094] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,101] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,105] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,112] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,112] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,113] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,115] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,118] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,129] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,130] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,134] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 09:40:57,460] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 09:40:57,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step124500/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 124500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 3190.40 iteration 124600/ 152972 | consumed samples: 58715584 | consumed tokens: 120249516032 | elapsed time per iteration (ms): 4662.3 | learning rate: 2.824E-05 | global batch size: 512 | lm loss: 1.459672E+00 | loss scale: 65536.0 | grad norm: 7917.561 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 124800/ 152972 | consumed samples: 58817984 | consumed tokens: 120459231232 | elapsed time per iteration (ms): 4659.4 | learning rate: 2.799E-05 | global batch size: 512 | lm loss: 1.396481E+00 | loss scale: 131072.0 | grad norm: 13481.116 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 125000/ 152972 | consumed samples: 58920384 | consumed tokens: 120668946432 | elapsed time per iteration (ms): 4657.8 | learning rate: 2.775E-05 | global batch size: 512 | lm loss: 1.427181E+00 | loss scale: 131072.0 | grad norm: 12227.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 125000 | lm loss value: 1.437351E+00 | lm loss PPL: 4.209531E+00 | -------------------------------------------------------------------------------------------- iteration 125200/ 152972 | consumed samples: 59022784 | consumed tokens: 120878661632 | elapsed time per iteration (ms): 5189.3 | learning rate: 2.751E-05 | global batch size: 512 | lm loss: 1.391203E+00 | loss scale: 262144.0 | grad norm: 31248.337 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 125400/ 152972 | consumed samples: 59125184 | consumed tokens: 121088376832 | elapsed time per iteration (ms): 4648.4 | learning rate: 2.726E-05 | global batch size: 512 | lm loss: 1.400434E+00 | loss scale: 262144.0 | grad norm: 25449.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 125600/ 152972 | consumed samples: 59227584 | consumed tokens: 121298092032 | elapsed time per iteration (ms): 4631.2 | learning rate: 2.703E-05 | global batch size: 512 | lm loss: 1.402310E+00 | loss scale: 262144.0 | grad norm: 32064.311 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 125800/ 152972 | consumed samples: 59329984 | consumed tokens: 121507807232 | elapsed time per iteration (ms): 4635.7 | learning rate: 2.679E-05 | global batch size: 512 | lm loss: 1.410170E+00 | loss scale: 131072.0 | grad norm: 16449.651 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 11:38:55,893] [INFO] [logging.py:68:log_dist] [Rank 0] step=126000, skipped=265, lr=[2.655387495396744e-05, 2.655387495396744e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 126000/ 152972 | consumed samples: 59432384 | consumed tokens: 121717522432 | elapsed time per iteration (ms): 4646.4 | learning rate: 2.655E-05 | global batch size: 512 | lm loss: 1.448572E+00 | loss scale: 131072.0 | grad norm: 15695.628 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 126000 loss: 1.5075 iter time (s): 0.002 samples/sec: 220728.144 -------------------------------------------------------------------------------------------- valid loss at iteration 126000 | lm loss value: 1.446511E+00 | lm loss PPL: 4.248266E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 126000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 11:40:46,080] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/mp_rank_00_model_states.pt [2021-11-28 11:40:46,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,508] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,509] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,510] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,511] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,527] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,531] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,539] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,539] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 11:40:46,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 11:40:46,891] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step126000/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 126000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2778.52 iteration 126200/ 152972 | consumed samples: 59534784 | consumed tokens: 121927237632 | elapsed time per iteration (ms): 5205.0 | learning rate: 2.632E-05 | global batch size: 512 | lm loss: 1.392023E+00 | loss scale: 131072.0 | grad norm: 20318.244 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 126400/ 152972 | consumed samples: 59637184 | consumed tokens: 122136952832 | elapsed time per iteration (ms): 4641.7 | learning rate: 2.609E-05 | global batch size: 512 | lm loss: 1.424204E+00 | loss scale: 262144.0 | grad norm: 41715.585 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 126600/ 152972 | consumed samples: 59739584 | consumed tokens: 122346668032 | elapsed time per iteration (ms): 4656.0 | learning rate: 2.586E-05 | global batch size: 512 | lm loss: 1.401861E+00 | loss scale: 131072.0 | grad norm: 13970.003 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 126800/ 152972 | consumed samples: 59841984 | consumed tokens: 122556383232 | elapsed time per iteration (ms): 4648.0 | learning rate: 2.563E-05 | global batch size: 512 | lm loss: 1.416212E+00 | loss scale: 65536.0 | grad norm: 8219.351 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 127000/ 152972 | consumed samples: 59944384 | consumed tokens: 122766098432 | elapsed time per iteration (ms): 4656.2 | learning rate: 2.540E-05 | global batch size: 512 | lm loss: 1.433258E+00 | loss scale: 65536.0 | grad norm: 12006.300 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 127000 | lm loss value: 1.322624E+00 | lm loss PPL: 3.753256E+00 | -------------------------------------------------------------------------------------------- iteration 127200/ 152972 | consumed samples: 60046784 | consumed tokens: 122975813632 | elapsed time per iteration (ms): 5176.5 | learning rate: 2.517E-05 | global batch size: 512 | lm loss: 1.438067E+00 | loss scale: 131072.0 | grad norm: 17706.151 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 127400/ 152972 | consumed samples: 60149184 | consumed tokens: 123185528832 | elapsed time per iteration (ms): 4647.0 | learning rate: 2.494E-05 | global batch size: 512 | lm loss: 1.446651E+00 | loss scale: 131072.0 | grad norm: 15006.843 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 127500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 13:38:50,127] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/mp_rank_00_model_states.pt [2021-11-28 13:38:50,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,551] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,557] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,558] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,559] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,565] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,566] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,567] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,573] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,589] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,591] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,592] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,601] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,602] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,619] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,626] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 13:38:50,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 13:38:50,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step127500/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 127500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 3350.68 iteration 127600/ 152972 | consumed samples: 60251584 | consumed tokens: 123395244032 | elapsed time per iteration (ms): 4660.8 | learning rate: 2.472E-05 | global batch size: 512 | lm loss: 1.402646E+00 | loss scale: 65536.0 | grad norm: 7937.833 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 127800/ 152972 | consumed samples: 60353984 | consumed tokens: 123604959232 | elapsed time per iteration (ms): 4638.9 | learning rate: 2.450E-05 | global batch size: 512 | lm loss: 1.387233E+00 | loss scale: 65536.0 | grad norm: 6037.633 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 14:17:31,761] [INFO] [logging.py:68:log_dist] [Rank 0] step=128000, skipped=271, lr=[2.4277856329684335e-05, 2.4277856329684335e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 128000 loss: 2.2609 iter time (s): 0.002 samples/sec: 220134.720 iteration 128000/ 152972 | consumed samples: 60456384 | consumed tokens: 123814674432 | elapsed time per iteration (ms): 4649.2 | learning rate: 2.428E-05 | global batch size: 512 | lm loss: 1.505510E+00 | loss scale: 131072.0 | grad norm: 22496.373 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 128000 | lm loss value: 1.385773E+00 | lm loss PPL: 3.997914E+00 | -------------------------------------------------------------------------------------------- iteration 128200/ 152972 | consumed samples: 60558784 | consumed tokens: 124024389632 | elapsed time per iteration (ms): 5205.3 | learning rate: 2.406E-05 | global batch size: 512 | lm loss: 1.416476E+00 | loss scale: 131072.0 | grad norm: 16377.644 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 128400/ 152972 | consumed samples: 60661184 | consumed tokens: 124234104832 | elapsed time per iteration (ms): 4653.5 | learning rate: 2.384E-05 | global batch size: 512 | lm loss: 1.390215E+00 | loss scale: 131072.0 | grad norm: 10603.454 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 128600/ 152972 | consumed samples: 60763584 | consumed tokens: 124443820032 | elapsed time per iteration (ms): 4657.9 | learning rate: 2.362E-05 | global batch size: 512 | lm loss: 1.400957E+00 | loss scale: 262144.0 | grad norm: 19008.593 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 128800/ 152972 | consumed samples: 60865984 | consumed tokens: 124653535232 | elapsed time per iteration (ms): 4654.2 | learning rate: 2.341E-05 | global batch size: 512 | lm loss: 1.405600E+00 | loss scale: 262144.0 | grad norm: 39830.639 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 129000/ 152972 | consumed samples: 60968384 | consumed tokens: 124863250432 | elapsed time per iteration (ms): 4639.5 | learning rate: 2.320E-05 | global batch size: 512 | lm loss: 1.448699E+00 | loss scale: 131072.0 | grad norm: 20086.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 129000 | lm loss value: 1.389515E+00 | lm loss PPL: 4.012903E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 129000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 15:38:46,484] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/mp_rank_00_model_states.pt [2021-11-28 15:38:46,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,922] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,922] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,924] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,934] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,949] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,950] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,954] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,957] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,972] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 15:38:46,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 15:38:46,995] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 15:38:47,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 15:38:47,286] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 15:38:47,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step129000/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 129000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 3447.30 iteration 129200/ 152972 | consumed samples: 61070784 | consumed tokens: 125072965632 | elapsed time per iteration (ms): 5211.2 | learning rate: 2.298E-05 | global batch size: 512 | lm loss: 1.467468E+00 | loss scale: 131072.0 | grad norm: 15430.321 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 129400/ 152972 | consumed samples: 61173184 | consumed tokens: 125282680832 | elapsed time per iteration (ms): 4657.5 | learning rate: 2.278E-05 | global batch size: 512 | lm loss: 1.379878E+00 | loss scale: 262144.0 | grad norm: 33053.577 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 129600/ 152972 | consumed samples: 61275584 | consumed tokens: 125492396032 | elapsed time per iteration (ms): 4648.4 | learning rate: 2.257E-05 | global batch size: 512 | lm loss: 1.424831E+00 | loss scale: 131072.0 | grad norm: 12776.227 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 129800/ 152972 | consumed samples: 61377984 | consumed tokens: 125702111232 | elapsed time per iteration (ms): 4642.7 | learning rate: 2.236E-05 | global batch size: 512 | lm loss: 1.423897E+00 | loss scale: 131072.0 | grad norm: 20396.083 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 16:56:14,300] [INFO] [logging.py:68:log_dist] [Rank 0] step=130000, skipped=275, lr=[2.2155338354725775e-05, 2.2155338354725775e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 130000 loss: 1.6746 iter time (s): 0.002 samples/sec: 220654.117 iteration 130000/ 152972 | consumed samples: 61480384 | consumed tokens: 125911826432 | elapsed time per iteration (ms): 4642.5 | learning rate: 2.216E-05 | global batch size: 512 | lm loss: 1.414935E+00 | loss scale: 131072.0 | grad norm: 17880.871 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 130000 | lm loss value: 1.344580E+00 | lm loss PPL: 3.836576E+00 | -------------------------------------------------------------------------------------------- iteration 130200/ 152972 | consumed samples: 61582784 | consumed tokens: 126121541632 | elapsed time per iteration (ms): 5184.8 | learning rate: 2.195E-05 | global batch size: 512 | lm loss: 1.426712E+00 | loss scale: 262144.0 | grad norm: 32999.368 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 130400/ 152972 | consumed samples: 61685184 | consumed tokens: 126331256832 | elapsed time per iteration (ms): 4647.6 | learning rate: 2.175E-05 | global batch size: 512 | lm loss: 1.459194E+00 | loss scale: 65536.0 | grad norm: 6636.134 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 130500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 17:36:46,477] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/mp_rank_00_model_states.pt [2021-11-28 17:36:46,898] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,904] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,906] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,908] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,909] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,910] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,911] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,912] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,913] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,914] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,915] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,916] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,917] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,918] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,922] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,923] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,926] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,940] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,944] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,945] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,947] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,953] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,967] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 17:36:46,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 17:36:46,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 17:36:47,255] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 17:36:47,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step130500/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 130500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2840.80 iteration 130600/ 152972 | consumed samples: 61787584 | consumed tokens: 126540972032 | elapsed time per iteration (ms): 4646.2 | learning rate: 2.155E-05 | global batch size: 512 | lm loss: 1.429080E+00 | loss scale: 65536.0 | grad norm: 5737.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 130800/ 152972 | consumed samples: 61889984 | consumed tokens: 126750687232 | elapsed time per iteration (ms): 4631.6 | learning rate: 2.135E-05 | global batch size: 512 | lm loss: 1.404430E+00 | loss scale: 65536.0 | grad norm: 8712.450 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 131000/ 152972 | consumed samples: 61992384 | consumed tokens: 126960402432 | elapsed time per iteration (ms): 4642.3 | learning rate: 2.115E-05 | global batch size: 512 | lm loss: 1.378702E+00 | loss scale: 131072.0 | grad norm: 16129.330 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 131000 | lm loss value: 1.390367E+00 | lm loss PPL: 4.016325E+00 | -------------------------------------------------------------------------------------------- iteration 131200/ 152972 | consumed samples: 62094784 | consumed tokens: 127170117632 | elapsed time per iteration (ms): 5187.1 | learning rate: 2.096E-05 | global batch size: 512 | lm loss: 1.465832E+00 | loss scale: 131072.0 | grad norm: 10651.032 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 131400/ 152972 | consumed samples: 62197184 | consumed tokens: 127379832832 | elapsed time per iteration (ms): 4644.7 | learning rate: 2.076E-05 | global batch size: 512 | lm loss: 1.418712E+00 | loss scale: 262144.0 | grad norm: 28218.598 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 131600/ 152972 | consumed samples: 62299584 | consumed tokens: 127589548032 | elapsed time per iteration (ms): 4654.2 | learning rate: 2.057E-05 | global batch size: 512 | lm loss: 1.471161E+00 | loss scale: 262144.0 | grad norm: 33799.091 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 131800/ 152972 | consumed samples: 62401984 | consumed tokens: 127799263232 | elapsed time per iteration (ms): 4645.6 | learning rate: 2.038E-05 | global batch size: 512 | lm loss: 1.410114E+00 | loss scale: 131072.0 | grad norm: 16819.770 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 19:34:42,698] [INFO] [logging.py:68:log_dist] [Rank 0] step=132000, skipped=280, lr=[2.0193721296114485e-05, 2.0193721296114485e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 132000/ 152972 | consumed samples: 62504384 | consumed tokens: 128008978432 | elapsed time per iteration (ms): 4657.9 | learning rate: 2.019E-05 | global batch size: 512 | lm loss: 1.443113E+00 | loss scale: 131072.0 | grad norm: 16350.220 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 132000 loss: 1.5586 iter time (s): 0.002 samples/sec: 219521.643 -------------------------------------------------------------------------------------------- valid loss at iteration 132000 | lm loss value: 1.475952E+00 | lm loss PPL: 4.375200E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 132000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 19:36:33,054] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/mp_rank_00_model_states.pt [2021-11-28 19:36:33,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,481] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,483] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,485] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,486] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,487] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,488] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,489] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,493] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,494] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,501] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,501] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,503] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,505] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,506] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,513] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,518] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,519] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,520] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,521] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,522] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,523] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,524] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,525] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,526] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,529] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,532] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,533] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,539] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,541] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,548] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 19:36:33,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,847] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 19:36:33,867] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step132000/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 132000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2865.11 iteration 132200/ 152972 | consumed samples: 62606784 | consumed tokens: 128218693632 | elapsed time per iteration (ms): 5204.1 | learning rate: 2.001E-05 | global batch size: 512 | lm loss: 1.430495E+00 | loss scale: 262144.0 | grad norm: 30926.099 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 132400/ 152972 | consumed samples: 62709184 | consumed tokens: 128428408832 | elapsed time per iteration (ms): 4635.5 | learning rate: 1.982E-05 | global batch size: 512 | lm loss: 1.380152E+00 | loss scale: 65536.0 | grad norm: 8170.594 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 132600/ 152972 | consumed samples: 62811584 | consumed tokens: 128638124032 | elapsed time per iteration (ms): 4653.0 | learning rate: 1.964E-05 | global batch size: 512 | lm loss: 1.420517E+00 | loss scale: 65536.0 | grad norm: 6643.409 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 132800/ 152972 | consumed samples: 62913984 | consumed tokens: 128847839232 | elapsed time per iteration (ms): 4650.3 | learning rate: 1.946E-05 | global batch size: 512 | lm loss: 1.408168E+00 | loss scale: 65536.0 | grad norm: 10984.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 133000/ 152972 | consumed samples: 63016384 | consumed tokens: 129057554432 | elapsed time per iteration (ms): 4649.0 | learning rate: 1.927E-05 | global batch size: 512 | lm loss: 1.377173E+00 | loss scale: 131072.0 | grad norm: 18808.060 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 133000 | lm loss value: 1.404806E+00 | lm loss PPL: 4.074738E+00 | -------------------------------------------------------------------------------------------- iteration 133200/ 152972 | consumed samples: 63118784 | consumed tokens: 129267269632 | elapsed time per iteration (ms): 5191.8 | learning rate: 1.910E-05 | global batch size: 512 | lm loss: 1.389131E+00 | loss scale: 131072.0 | grad norm: 17763.792 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 133400/ 152972 | consumed samples: 63221184 | consumed tokens: 129476984832 | elapsed time per iteration (ms): 4661.7 | learning rate: 1.892E-05 | global batch size: 512 | lm loss: 1.387055E+00 | loss scale: 262144.0 | grad norm: 29495.469 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 133500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 21:34:38,725] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/mp_rank_00_model_states.pt [2021-11-28 21:34:39,142] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,145] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,149] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,152] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,154] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,156] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,157] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,160] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,161] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,163] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,164] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,165] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,167] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,168] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,169] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,174] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,182] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,183] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,184] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,188] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,189] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,190] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,191] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,192] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,194] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,193] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,196] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,197] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,198] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,199] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,200] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,203] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,207] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,208] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,214] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,227] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 21:34:39,228] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,317] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,336] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,515] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 21:34:39,530] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step133500/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 133500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2812.55 iteration 133600/ 152972 | consumed samples: 63323584 | consumed tokens: 129686700032 | elapsed time per iteration (ms): 4667.0 | learning rate: 1.874E-05 | global batch size: 512 | lm loss: 1.440424E+00 | loss scale: 262144.0 | grad norm: 27756.082 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 133800/ 152972 | consumed samples: 63425984 | consumed tokens: 129896415232 | elapsed time per iteration (ms): 4644.0 | learning rate: 1.857E-05 | global batch size: 512 | lm loss: 1.407191E+00 | loss scale: 262144.0 | grad norm: 29603.285 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-28 22:13:24,616] [INFO] [logging.py:68:log_dist] [Rank 0] step=134000, skipped=286, lr=[1.839654583203192e-05, 1.839654583203192e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 134000 loss: 1.1950 iter time (s): 0.002 samples/sec: 220647.520 iteration 134000/ 152972 | consumed samples: 63528384 | consumed tokens: 130106130432 | elapsed time per iteration (ms): 4653.2 | learning rate: 1.840E-05 | global batch size: 512 | lm loss: 1.414057E+00 | loss scale: 131072.0 | grad norm: 14477.552 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 134000 | lm loss value: 1.352838E+00 | lm loss PPL: 3.868388E+00 | -------------------------------------------------------------------------------------------- iteration 134200/ 152972 | consumed samples: 63630784 | consumed tokens: 130315845632 | elapsed time per iteration (ms): 5189.6 | learning rate: 1.823E-05 | global batch size: 512 | lm loss: 1.387214E+00 | loss scale: 131072.0 | grad norm: 14813.549 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 134400/ 152972 | consumed samples: 63733184 | consumed tokens: 130525560832 | elapsed time per iteration (ms): 4651.4 | learning rate: 1.806E-05 | global batch size: 512 | lm loss: 1.374727E+00 | loss scale: 131072.0 | grad norm: 12403.269 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 134600/ 152972 | consumed samples: 63835584 | consumed tokens: 130735276032 | elapsed time per iteration (ms): 4656.6 | learning rate: 1.789E-05 | global batch size: 512 | lm loss: 1.409917E+00 | loss scale: 131072.0 | grad norm: 15623.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 134800/ 152972 | consumed samples: 63937984 | consumed tokens: 130944991232 | elapsed time per iteration (ms): 4640.4 | learning rate: 1.772E-05 | global batch size: 512 | lm loss: 1.407207E+00 | loss scale: 65536.0 | grad norm: 7237.235 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 135000/ 152972 | consumed samples: 64040384 | consumed tokens: 131154706432 | elapsed time per iteration (ms): 4638.1 | learning rate: 1.756E-05 | global batch size: 512 | lm loss: 1.425891E+00 | loss scale: 65536.0 | grad norm: 8786.597 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 135000 | lm loss value: 1.334590E+00 | lm loss PPL: 3.798439E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 135000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-28 23:34:29,997] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/mp_rank_00_model_states.pt [2021-11-28 23:34:30,419] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,425] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,427] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,428] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,430] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,433] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,438] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,443] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,447] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,450] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,453] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,456] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,457] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,458] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,459] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,463] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,464] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,465] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,466] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,467] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,470] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,471] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,472] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,473] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,476] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,477] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,478] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,484] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,504] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-28 23:34:30,512] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-28 23:34:30,811] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step135000/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 135000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2818.48 iteration 135200/ 152972 | consumed samples: 64142784 | consumed tokens: 131364421632 | elapsed time per iteration (ms): 5211.0 | learning rate: 1.740E-05 | global batch size: 512 | lm loss: 1.430121E+00 | loss scale: 32768.0 | grad norm: 3403.148 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 135400/ 152972 | consumed samples: 64245184 | consumed tokens: 131574136832 | elapsed time per iteration (ms): 4638.1 | learning rate: 1.724E-05 | global batch size: 512 | lm loss: 1.449781E+00 | loss scale: 32768.0 | grad norm: 4634.987 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 135600/ 152972 | consumed samples: 64347584 | consumed tokens: 131783852032 | elapsed time per iteration (ms): 4642.1 | learning rate: 1.708E-05 | global batch size: 512 | lm loss: 1.403755E+00 | loss scale: 32768.0 | grad norm: 2432.622 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 135800/ 152972 | consumed samples: 64449984 | consumed tokens: 131993567232 | elapsed time per iteration (ms): 4645.3 | learning rate: 1.692E-05 | global batch size: 512 | lm loss: 1.439139E+00 | loss scale: 65536.0 | grad norm: 8397.407 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 00:51:56,544] [INFO] [logging.py:68:log_dist] [Rank 0] step=136000, skipped=290, lr=[1.6764701069801866e-05, 1.6764701069801866e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 136000/ 152972 | consumed samples: 64552384 | consumed tokens: 132203282432 | elapsed time per iteration (ms): 4647.0 | learning rate: 1.676E-05 | global batch size: 512 | lm loss: 1.391651E+00 | loss scale: 65536.0 | grad norm: 10034.602 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 136000 loss: 1.9536 iter time (s): 0.002 samples/sec: 219374.793 -------------------------------------------------------------------------------------------- valid loss at iteration 136000 | lm loss value: 1.391531E+00 | lm loss PPL: 4.021001E+00 | -------------------------------------------------------------------------------------------- iteration 136200/ 152972 | consumed samples: 64654784 | consumed tokens: 132412997632 | elapsed time per iteration (ms): 5193.1 | learning rate: 1.661E-05 | global batch size: 512 | lm loss: 1.455036E+00 | loss scale: 131072.0 | grad norm: 16498.206 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 136400/ 152972 | consumed samples: 64757184 | consumed tokens: 132622712832 | elapsed time per iteration (ms): 4649.4 | learning rate: 1.646E-05 | global batch size: 512 | lm loss: 1.411784E+00 | loss scale: 131072.0 | grad norm: 13506.008 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 136500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 01:32:33,503] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/mp_rank_00_model_states.pt [2021-11-29 01:32:33,929] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,930] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,931] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,932] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,933] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,935] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,936] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,937] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,938] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,939] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,941] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,943] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,946] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,948] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,951] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,952] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,956] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,958] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,959] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,960] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,963] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,964] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,965] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,967] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,968] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,969] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,970] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,971] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,973] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,974] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,975] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,976] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,978] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,979] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,980] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,981] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 01:32:33,984] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,986] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 01:32:33,989] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 01:32:34,006] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 01:32:34,049] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 01:32:34,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 01:32:34,328] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step136500/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 136500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2852.46 iteration 136600/ 152972 | consumed samples: 64859584 | consumed tokens: 132832428032 | elapsed time per iteration (ms): 4679.3 | learning rate: 1.631E-05 | global batch size: 512 | lm loss: 1.440709E+00 | loss scale: 131072.0 | grad norm: 14406.487 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 136800/ 152972 | consumed samples: 64961984 | consumed tokens: 133042143232 | elapsed time per iteration (ms): 4644.8 | learning rate: 1.616E-05 | global batch size: 512 | lm loss: 1.397358E+00 | loss scale: 262144.0 | grad norm: 19240.495 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 137000/ 152972 | consumed samples: 65064384 | consumed tokens: 133251858432 | elapsed time per iteration (ms): 4648.3 | learning rate: 1.601E-05 | global batch size: 512 | lm loss: 1.410344E+00 | loss scale: 131072.0 | grad norm: 13369.652 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 137000 | lm loss value: 1.448801E+00 | lm loss PPL: 4.258005E+00 | -------------------------------------------------------------------------------------------- iteration 137200/ 152972 | consumed samples: 65166784 | consumed tokens: 133461573632 | elapsed time per iteration (ms): 5183.6 | learning rate: 1.587E-05 | global batch size: 512 | lm loss: 1.437702E+00 | loss scale: 131072.0 | grad norm: 15268.917 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 137400/ 152972 | consumed samples: 65269184 | consumed tokens: 133671288832 | elapsed time per iteration (ms): 4659.9 | learning rate: 1.572E-05 | global batch size: 512 | lm loss: 1.405018E+00 | loss scale: 131072.0 | grad norm: 17432.153 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 137600/ 152972 | consumed samples: 65371584 | consumed tokens: 133881004032 | elapsed time per iteration (ms): 4653.2 | learning rate: 1.558E-05 | global batch size: 512 | lm loss: 1.429552E+00 | loss scale: 262144.0 | grad norm: 30347.371 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 137800/ 152972 | consumed samples: 65473984 | consumed tokens: 134090719232 | elapsed time per iteration (ms): 4644.6 | learning rate: 1.544E-05 | global batch size: 512 | lm loss: 1.446872E+00 | loss scale: 65536.0 | grad norm: 9272.526 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 03:30:35,344] [INFO] [logging.py:68:log_dist] [Rank 0] step=138000, skipped=295, lr=[1.5303912101312385e-05, 1.5303912101312385e-05], mom=[(0.9, 0.999), (0.9, 0.999)] iteration 138000/ 152972 | consumed samples: 65576384 | consumed tokens: 134300434432 | elapsed time per iteration (ms): 4637.8 | learning rate: 1.530E-05 | global batch size: 512 | lm loss: 1.381682E+00 | loss scale: 65536.0 | grad norm: 10285.752 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | steps: 138000 loss: 1.9639 iter time (s): 0.002 samples/sec: 220781.121 -------------------------------------------------------------------------------------------- valid loss at iteration 138000 | lm loss value: 1.399287E+00 | lm loss PPL: 4.052309E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 138000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 03:32:25,649] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/mp_rank_00_model_states.pt [2021-11-29 03:32:26,072] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,073] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,074] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,075] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,077] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,079] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,083] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,089] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,090] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,096] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,097] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,099] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,102] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,103] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,107] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,108] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,109] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,109] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,111] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,111] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,111] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,113] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,115] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,115] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,123] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,125] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,125] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,127] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,128] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,131] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,135] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,138] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,139] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,140] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 03:32:26,179] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,180] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,410] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 03:32:26,454] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138000/zero_pp_rank_3_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 138000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2778.09 iteration 138200/ 152972 | consumed samples: 65678784 | consumed tokens: 134510149632 | elapsed time per iteration (ms): 5202.9 | learning rate: 1.517E-05 | global batch size: 512 | lm loss: 1.432905E+00 | loss scale: 65536.0 | grad norm: 8435.686 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 138400/ 152972 | consumed samples: 65781184 | consumed tokens: 134719864832 | elapsed time per iteration (ms): 4646.4 | learning rate: 1.503E-05 | global batch size: 512 | lm loss: 1.466470E+00 | loss scale: 131072.0 | grad norm: 16373.885 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 138600/ 152972 | consumed samples: 65883584 | consumed tokens: 134929580032 | elapsed time per iteration (ms): 4650.1 | learning rate: 1.490E-05 | global batch size: 512 | lm loss: 1.467036E+00 | loss scale: 131072.0 | grad norm: 21554.114 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 138753 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 04:30:50,308] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/mp_rank_00_model_states.pt [2021-11-29 04:30:50,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,744] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,744] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,744] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,745] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,747] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,747] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,749] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,752] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,756] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,759] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,760] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,762] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,767] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,767] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,771] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,772] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,777] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,778] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,778] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,778] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,779] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,780] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,781] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,782] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,783] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,784] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,787] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,789] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,794] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,795] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 04:30:50,797] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,826] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,836] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,854] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 04:30:50,860] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step138753/zero_pp_rank_1_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 138753 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2686.65 [exiting program after 1190.006038081646 minutes] datetime: 2021-11-29 04:30:51 ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ---------------------------------------------------------------------------------------------------- DeepSpeed C++/CUDA extension op report DeepSpeed C++/CUDA extension op report-------------------------------------------------- --------------------------------------------------NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op.-------------------------------------------------- JIT compiled ops requires ninja-------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed ninja.. compatible.................. [OKAY]-------------------------------------------------- -------------------------------------------------- op name ................ installed ..cpu_adam compatible............... ninja--------------------------------------------------[YES] ........................ [OKAY][OKAY] cpu_adam-------------------------------------------------- ...............fused_adamop name [YES] ................ ................... installed[YES][OKAY] ........ compatible[OKAY] -------------------------------------------------- fused_lamb fused_adam............. ............. [YES][YES]cpu_adam ............ ............... [OKAY] [OKAY] [YES] ...... [OKAY]fused_lamb ............. [YES] ...... [OKAY] sparse_attnfused_adam ......................... [NO][YES] ............. [OKAY][OKAY] transformer sparse_attn............ fused_lamb............[YES] [NO]................... .......[OKAY][YES] [OKAY]...... stochastic_transformer[OKAY] transformer ............. [YES][YES] ............ [OKAY] sparse_attn[OKAY] ............ stochastic_transformer[NO] ........ [YES][OKAY] ...... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- ninjaop name ................ ..................installed [OKAY].. -------------------------------------------------- op namecompatible ................ --------------------------------------------------installed .. compatible -------------------------------------------------- cpu_adamcpu_adam ............... [YES] ..................... [YES][OKAY] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_adamfused_lamb .......................... [YES][YES] ...... [OKAY]...... [OKAY] fused_lamb ............. [YES] sparse_attn...... ............ [NO] .......[OKAY] [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . sparse_attn[YES] .................. [OKAY] [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES]ninja ...... [OKAY].................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible sparse_attn --------------------------------------------------............ [NO] ....... [OKAY] cpu_adamtransformer ........................... [YES][YES] ............ [OKAY][OKAY] stochastic_transformer . [YES]fused_adam ................... [OKAY][YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report JIT compiled ops requires ninja -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] ninja .................. [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io async_io............... ...............[NO] [NO]....... .......[NO] [NO] transformer_inference .. transformer_inference[NO] ......... [NO][OKAY] ....... [OKAY] utils .................. [YES]utils ........................ [OKAY][YES] ...... [OKAY] quantizer .............. [NO] quantizer....... ..............[OKAY] [NO] ....... [OKAY]-------------------------------------------------- -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info:DeepSpeed general environment info: torch install pathtorch install path .............................. ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch versiontorch version ........................................ 1.8.21.8.2 torch cuda versiontorch cuda version .............................. 11.111.1 nvcc versionnvcc version .......................................... 11.211.2 deepspeed install pathdeepspeed install path ...................... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed infodeepspeed info ...................................... 0.5.5+58a8e13, 58a8e13, master0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w.deepspeed wheel compiled w. ............ torch 1.8, cuda 11.1torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master ninja .................. [OKAY] deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja DeepSpeed general environment info: DeepSpeed general environment info:torch install path ............... torch install path ...............['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch cuda version torch version............... ....................11.1 1.8.2nvcc version .....................torch cuda version 11.2............... 11.1deepspeed install path ...........nvcc version .....................['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] 11.2 deepspeed infodeepspeed install path .............................. 0.5.5+58a8e13, 58a8e13, master ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed']deepspeed wheel compiled w. deepspeed info...... ...................torch 1.8, cuda 11.1 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja -------------------------------------------------- DeepSpeed C++/CUDA extension op report -------------------------------------------------- NOTE: Ops not installed will be just-in-time (JIT) compiled at runtime if needed. Op compatibility means that your system meet the required dependencies to JIT install the op. -------------------------------------------------- JIT compiled ops requires ninja ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY] DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 ninja .................. [OKAY] -------------------------------------------------- op name ................ installed .. compatible -------------------------------------------------- cpu_adam ............... [YES] ...... [OKAY] fused_adam ............. [YES] ...... [OKAY] fused_lamb ............. [YES] ...... [OKAY] sparse_attn ............ [NO] ....... [OKAY] transformer ............ [YES] ...... [OKAY] stochastic_transformer . [YES] ...... [OKAY]  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] DeepSpeed general environment info: transformer_inference .. [NO] ....... [OKAY] torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 utils .................. [YES] ...... [OKAY] deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ******** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found. DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] --------------------------------------------------  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch']  [WARNING]  async_io: please install the libaio-devel package with yum torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > setting tensorboard ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position ****  [WARNING]  async_io: please install the libaio-devel package with yum  [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found. async_io ............... [NO] ....... [NO] transformer_inference .. [NO] ....... [OKAY] utils .................. [YES] ...... [OKAY] quantizer .............. [NO] ....... [OKAY] -------------------------------------------------- **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** using world size: 64, data-parallel-size: 32, tensor-model-parallel size: 1, pipeline-model-parallel size: 2 using torch.float16 for parameters ... ------------------------ arguments ------------------------ accumulate_allreduce_grads_in_fp32 .............. False adam_beta1 ...................................... 0.9 adam_beta2 ...................................... 0.999 adam_eps ........................................ 1e-08 adlr_autoresume ................................. False adlr_autoresume_interval ........................ 1000 apply_query_key_layer_scaling ................... True apply_residual_connection_post_layernorm ........ False attention_dropout ............................... 0.1 attention_softmax_in_fp32 ....................... False bert_binary_head ................................ True bert_load ....................................... None bf16 ............................................ False bias_dropout_fusion ............................. True bias_gelu_fusion ................................ True biencoder_projection_dim ........................ 0 biencoder_shared_query_context_model ............ False block_data_path ................................. None checkpoint_activations .......................... True checkpoint_in_cpu ............................... False checkpoint_num_layers ........................... 1 clip_grad ....................................... 1.0 codecarbon_dir .................................. None consumed_train_samples .......................... 0 consumed_train_tokens ........................... 0 consumed_valid_samples .......................... 0 contigious_checkpointing ........................ False cpu_optimizer ................................... False cpu_torch_adam .................................. False curriculum_learning ............................. False data_impl ....................................... mmap data_parallel_size .............................. 32 data_path ....................................... ['/gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document'] dataloader_type ................................. single DDP_impl ........................................ local decoder_seq_length .............................. None deepscale ....................................... False deepscale_config ................................ None deepspeed ....................................... True deepspeed_activation_checkpointing .............. True deepspeed_config ................................ ./ds_config.2091626.json deepspeed_mpi ................................... False distribute_checkpointed_activations ............. False distributed_backend ............................. nccl embedding_path .................................. None encoder_seq_length .............................. 2048 eod_mask_loss ................................... False eval_interval ................................... 1000 eval_iters ...................................... 100 evidence_data_path .............................. None exit_duration_in_mins ........................... 1190 exit_interval ................................... None ffn_hidden_size ................................. 8192 finetune ........................................ False fp16 ............................................ True fp16_lm_cross_entropy ........................... False fp32_residual_connection ........................ False gigaflos_no_embeds .............................. 0 global_batch_size ............................... 512 glu_activation .................................. None hidden_dropout .................................. 0.1 hidden_size ..................................... 2048 hysteresis ...................................... 2 ict_head_size ................................... None ict_load ........................................ None img_dim ......................................... 224 indexer_batch_size .............................. 128 indexer_log_interval ............................ 1000 init_method_std ................................. 0.02 init_method_xavier_uniform ...................... False initial_loss_scale .............................. 4294967296 kv_channels ..................................... 128 layernorm_epsilon ............................... 1e-05 lazy_mpu_init ................................... None load ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints local_rank ...................................... 0 log_batch_size_to_tensorboard ................... True log_interval .................................... 200 log_learning_rate_to_tensorboard ................ True log_level ....................................... None log_level_replica ............................... None log_loss_scale_to_tensorboard ................... True log_num_zeros_in_grad ........................... False log_params_norm ................................. False log_timers_to_tensorboard ....................... True log_validation_ppl_to_tensorboard ............... True loss_on_targets_only ............................ True loss_scale ...................................... 12.0 loss_scale_window ............................... 1000 lr .............................................. 0.0002 lr_decay_iters .................................. None lr_decay_samples ................................ 73242187 lr_decay_style .................................. cosine lr_decay_tokens ................................. None lr_warmup_fraction .............................. None lr_warmup_iters ................................. 0 lr_warmup_samples ............................... 183105 make_vocab_size_divisible_by .................... 128 mask_prob ....................................... 0.15 masked_softmax_fusion ........................... True max_position_embeddings ......................... 2048 memory_centric_tiled_linear ..................... False merge_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-merges.txt micro_batch_size ................................ 1 min_loss_scale .................................. 1.0 min_lr .......................................... 1e-05 mmap_warmup ..................................... False no_load_optim ................................... None no_load_rng ..................................... None no_save_optim ................................... None no_save_rng ..................................... None num_attention_heads ............................. 16 num_channels .................................... 3 num_classes ..................................... 1000 num_layers ...................................... 24 num_layers_per_virtual_pipeline_stage ........... None num_workers ..................................... 2 onnx_safe ....................................... None openai_gelu ..................................... False optimizer ....................................... adam override_lr_scheduler ........................... False params_dtype .................................... torch.float16 partition_activations ........................... False patch_dim ....................................... 16 pipeline_model_parallel_size .................... 2 position_embedding_type ......................... PositionEmbeddingType.absolute profile_backward ................................ False query_in_block_prob ............................. 0.1 rampup_batch_size ............................... ['32', '32', '2_000_000'] rank ............................................ 0 remote_device ................................... none reset_attention_mask ............................ False reset_position_ids .............................. False retriever_report_topk_accuracies ................ [] retriever_score_scaling ......................... False retriever_seq_length ............................ 256 reweight_loss_based_on_position_frequency ....... True sample_rate ..................................... 1.0 save ............................................ /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints save_interval ................................... 1500 scatter_gather_tensors_in_pipeline .............. True scattered_embeddings ............................ False seed ............................................ 1234 seq_length ...................................... 2048 sgd_momentum .................................... 0.9 short_seq_prob .................................. 0.1 split ........................................... 949,50,1 split_transformers .............................. False synchronize_each_layer .......................... False tensor_model_parallel_size ...................... 1 tensorboard_dir ................................. /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/tr6g-1B3-oscar-loss-reweighting-logs/tensorboard tensorboard_log_interval ........................ 1 tensorboard_queue_size .......................... 5 test_weighted_split_names ....................... None test_weighted_split_paths ....................... None test_weighted_split_splits ...................... None test_weighted_split_weights ..................... None tile_factor ..................................... 1 titles_data_path ................................ None tokenizer_name_or_path .......................... None tokenizer_type .................................. GPT2BPETokenizer train_iters ..................................... None train_samples ................................... 73242187 train_tokens .................................... None train_weighted_split_paths ...................... None use_bnb_optimizer ............................... False use_checkpoint_lr_scheduler ..................... False use_contiguous_buffers_in_ddp ................... False use_cpu_initialization .......................... None use_one_sent_docs ............................... False use_pin_memory .................................. False valid_weighted_split_names ...................... None valid_weighted_split_paths ...................... None valid_weighted_split_splits ..................... None valid_weighted_split_weights .................... None virtual_pipeline_model_parallel_size ............ None vocab_extra_ids ................................. 0 vocab_file ...................................... /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/data/gpt2-vocab.json weight_decay .................................... 0.1 world_size ...................................... 64 zero_allgather_bucket_size ...................... 0.0 zero_contigious_gradients ....................... False zero_reduce_bucket_size ......................... 0.0 zero_reduce_scatter ............................. False zero_stage ...................................... 1 -------------------- end of arguments --------------------- will use batch size rampup starting from global batch size 32 to global batch size 512 with batch size increments 32 over 2000000 samples. > building GPT2BPETokenizer tokenizer ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** DeepSpeed general environment info: torch install path ............... ['/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch'] torch version .................... 1.8.2 torch cuda version ............... 11.1 nvcc version ..................... 11.2 deepspeed install path ........... ['/gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed'] deepspeed info ................... 0.5.5+58a8e13, 58a8e13, master deepspeed wheel compiled w. ...... torch 1.8, cuda 11.1 **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > padded vocab (size: 50257) with 47 dummy tokens (new size: 50304) > initializing torch distributed ... **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** **** Git info for Megatron: git_hash=8a83121 git_branch=thomas/reweight_tokens_depending_on_their_position **** > initializing tensor model parallel with size 1 > initializing pipeline model parallel with size 2 > setting random seeds to 1234 ... [2021-11-29 04:31:37,374] [INFO] [checkpointing.py:226:model_parallel_cuda_manual_seed] > initializing model parallel cuda seeds on global rank 0, model parallel rank 0, and data parallel rank 0 with model parallel seed: 3952 and data parallel seed: 1234 > compiling dataset index builder ... make: Entering directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' g++ -O3 -Wall -shared -std=c++11 -fPIC -fdiagnostics-color -I/gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -I/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/pybind11/include helpers.cpp -o helpers.cpython-38-x86_64-linux-gnu.so make: Leaving directory '/gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/data' >>> done with dataset index builder. Compilation time: 6.218 seconds > compiling and loading fused kernels ... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_upper_triang_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF scaled_upper_triang_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax.cpp -o scaled_upper_triang_masked_softmax.o [2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_upper_triang_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_upper_triang_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_upper_triang_masked_softmax_cuda.cu -o scaled_upper_triang_masked_softmax_cuda.cuda.o [3/3] c++ scaled_upper_triang_masked_softmax.o scaled_upper_triang_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_upper_triang_masked_softmax_cuda.so Loading extension module scaled_upper_triang_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module scaled_masked_softmax_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF scaled_masked_softmax.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.cpp -o scaled_masked_softmax.o [2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output scaled_masked_softmax_cuda.cuda.o.d -DTORCH_EXTENSION_NAME=scaled_masked_softmax_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax_cuda.cu -o scaled_masked_softmax_cuda.cuda.o /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/scaled_masked_softmax.h(320): warning: variable "batch_count" was declared but never referenced [3/3] c++ scaled_masked_softmax.o scaled_masked_softmax_cuda.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o scaled_masked_softmax_cuda.so Loading extension module scaled_masked_softmax_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( Detected CUDA files, patching ldflags Emitting ninja build file /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/build/build.ninja... Building extension module fused_mix_prec_layer_norm_cuda... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] c++ -MMD -MF layer_norm_cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -O3 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda.cpp -o layer_norm_cuda.o [2/3] /gpfslocalsys/cuda/11.2/bin/nvcc --generate-dependencies-with-compile --dependency-output layer_norm_cuda_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=fused_mix_prec_layer_norm_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/TH -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/include/THC -isystem /gpfslocalsys/cuda/11.2/include -isystem /gpfswork/rech/six/commun/conda/thomas_prefix_lm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -O3 -gencode arch=compute_70,code=sm_70 --use_fast_math -maxrregcount=50 -gencode arch=compute_80,code=sm_80 -std=c++14 -c /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/fused_kernels/layer_norm_cuda_kernel.cu -o layer_norm_cuda_kernel.cuda.o [3/3] c++ layer_norm_cuda.o layer_norm_cuda_kernel.cuda.o -shared -L/gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/gpfslocalsys/cuda/11.2/lib64 -lcudart -o fused_mix_prec_layer_norm_cuda.so Loading extension module fused_mix_prec_layer_norm_cuda... /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( /gpfswork/rech/six/commun/conda/thomas_prefix_lm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:283: UserWarning: !! WARNING !! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Your compiler (c++) is not compatible with the compiler Pytorch was built with for this platform, which is g++ on linux. Please use g++ to to compile your extension. Alternatively, you may compile PyTorch from source using c++, and then you can also use c++ to compile your extension. See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help with compiling PyTorch from source. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !! WARNING !! warnings.warn(WRONG_COMPILER_WARNING.format( >>> done with compiling and loading fused kernels. Compilation time: 164.885 seconds time to initialize megatron (seconds): 196.479 [after megatron is initialized] datetime: 2021-11-29 04:34:28 building GPT model ... [2021-11-29 04:34:28,526] [INFO] [utils.py:806:see_memory_usage] Before Building Model [2021-11-29 04:34:28,527] [INFO] [utils.py:807:see_memory_usage] MA 0.0 GB Max_MA 0.0 GB CA 0.0 GB Max_CA 0 GB [2021-11-29 04:34:28,527] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.16 GB, percent = 21.4% SEED_LAYERS=False BASE_SEED=1234 SEED_FN=None Using topology: {ProcessCoord(pipe=0, data=0, model=0): 0, ProcessCoord(pipe=0, data=1, model=0): 1, ProcessCoord(pipe=0, data=2, model=0): 2, ProcessCoord(pipe=0, data=3, model=0): 3, ProcessCoord(pipe=0, data=4, model=0): 4, ProcessCoord(pipe=0, data=5, model=0): 5, ProcessCoord(pipe=0, data=6, model=0): 6, ProcessCoord(pipe=0, data=7, model=0): 7, ProcessCoord(pipe=0, data=8, model=0): 8, ProcessCoord(pipe=0, data=9, model=0): 9, ProcessCoord(pipe=0, data=10, model=0): 10, ProcessCoord(pipe=0, data=11, model=0): 11, ProcessCoord(pipe=0, data=12, model=0): 12, ProcessCoord(pipe=0, data=13, model=0): 13, ProcessCoord(pipe=0, data=14, model=0): 14, ProcessCoord(pipe=0, data=15, model=0): 15, ProcessCoord(pipe=0, data=16, model=0): 16, ProcessCoord(pipe=0, data=17, model=0): 17, ProcessCoord(pipe=0, data=18, model=0): 18, ProcessCoord(pipe=0, data=19, model=0): 19, ProcessCoord(pipe=0, data=20, model=0): 20, ProcessCoord(pipe=0, data=21, model=0): 21, ProcessCoord(pipe=0, data=22, model=0): 22, ProcessCoord(pipe=0, data=23, model=0): 23, ProcessCoord(pipe=0, data=24, model=0): 24, ProcessCoord(pipe=0, data=25, model=0): 25, ProcessCoord(pipe=0, data=26, model=0): 26, ProcessCoord(pipe=0, data=27, model=0): 27, ProcessCoord(pipe=0, data=28, model=0): 28, ProcessCoord(pipe=0, data=29, model=0): 29, ProcessCoord(pipe=0, data=30, model=0): 30, ProcessCoord(pipe=0, data=31, model=0): 31, ProcessCoord(pipe=1, data=0, model=0): 32, ProcessCoord(pipe=1, data=1, model=0): 33, ProcessCoord(pipe=1, data=2, model=0): 34, ProcessCoord(pipe=1, data=3, model=0): 35, ProcessCoord(pipe=1, data=4, model=0): 36, ProcessCoord(pipe=1, data=5, model=0): 37, ProcessCoord(pipe=1, data=6, model=0): 38, ProcessCoord(pipe=1, data=7, model=0): 39, ProcessCoord(pipe=1, data=8, model=0): 40, ProcessCoord(pipe=1, data=9, model=0): 41, ProcessCoord(pipe=1, data=10, model=0): 42, ProcessCoord(pipe=1, data=11, model=0): 43, ProcessCoord(pipe=1, data=12, model=0): 44, ProcessCoord(pipe=1, data=13, model=0): 45, ProcessCoord(pipe=1, data=14, model=0): 46, ProcessCoord(pipe=1, data=15, model=0): 47, ProcessCoord(pipe=1, data=16, model=0): 48, ProcessCoord(pipe=1, data=17, model=0): 49, ProcessCoord(pipe=1, data=18, model=0): 50, ProcessCoord(pipe=1, data=19, model=0): 51, ProcessCoord(pipe=1, data=20, model=0): 52, ProcessCoord(pipe=1, data=21, model=0): 53, ProcessCoord(pipe=1, data=22, model=0): 54, ProcessCoord(pipe=1, data=23, model=0): 55, ProcessCoord(pipe=1, data=24, model=0): 56, ProcessCoord(pipe=1, data=25, model=0): 57, ProcessCoord(pipe=1, data=26, model=0): 58, ProcessCoord(pipe=1, data=27, model=0): 59, ProcessCoord(pipe=1, data=28, model=0): 60, ProcessCoord(pipe=1, data=29, model=0): 61, ProcessCoord(pipe=1, data=30, model=0): 62, ProcessCoord(pipe=1, data=31, model=0): 63} [2021-11-29 04:34:29,824] [INFO] [module.py:365:_partition_layers] Partitioning pipeline stages with method type:transformer stage=0 layers=15 0: _to_float16 1: EmbeddingPipe 2: 3: ParallelTransformerLayerPipe 4: ParallelTransformerLayerPipe 5: ParallelTransformerLayerPipe 6: ParallelTransformerLayerPipe 7: ParallelTransformerLayerPipe 8: ParallelTransformerLayerPipe 9: ParallelTransformerLayerPipe 10: ParallelTransformerLayerPipe 11: ParallelTransformerLayerPipe 12: ParallelTransformerLayerPipe 13: ParallelTransformerLayerPipe 14: ParallelTransformerLayerPipe stage=1 layers=17 15: ParallelTransformerLayerPipe 16: ParallelTransformerLayerPipe 17: ParallelTransformerLayerPipe 18: ParallelTransformerLayerPipe 19: ParallelTransformerLayerPipe 20: ParallelTransformerLayerPipe 21: ParallelTransformerLayerPipe 22: ParallelTransformerLayerPipe 23: ParallelTransformerLayerPipe 24: ParallelTransformerLayerPipe 25: ParallelTransformerLayerPipe 26: ParallelTransformerLayerPipe 27: 28: 29: MixedFusedLayerNorm 30: EmbeddingPipe 31: float16_to_fp32 loss: CrossEntropy > number of parameters on (tensor, pipeline) model parallel rank (0, 1): 711520256 [2021-11-29 04:34:30,376] [INFO] [utils.py:806:see_memory_usage] After Building Model [2021-11-29 04:34:30,376] [INFO] [utils.py:807:see_memory_usage] MA 1.33 GB Max_MA 1.33 GB CA 1.36 GB Max_CA 1 GB [2021-11-29 04:34:30,376] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 40.37 GB, percent = 21.6% > number of parameters on (tensor, pipeline) model parallel rank (0, 0): 711516160 setting training iterations to 152972 > learning rate decay style: cosine DeepSpeed is enabled. [2021-11-29 04:34:30,396] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed info: version=0.5.5+58a8e13, git-hash=58a8e13, git-branch=master [2021-11-29 04:34:30,699] [INFO] [engine.py:207:__init__] DeepSpeed Flops Profiler Enabled: False [2021-11-29 04:34:30,699] [INFO] [engine.py:862:_configure_optimizer] Removing param_group that has no 'params' in the client Optimizer [2021-11-29 04:34:30,699] [INFO] [engine.py:868:_configure_optimizer] Using client Optimizer as basic optimizer [2021-11-29 04:34:30,703] [INFO] [engine.py:884:_configure_optimizer] DeepSpeed Basic Optimizer = FusedAdam [2021-11-29 04:34:30,703] [INFO] [utils.py:43:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type= [2021-11-29 04:34:30,703] [INFO] [logging.py:68:log_dist] [Rank 0] Creating fp16 ZeRO stage 1 optimizer [2021-11-29 04:34:30,703] [INFO] [stage2.py:111:__init__] Reduce bucket size 500000000 [2021-11-29 04:34:30,703] [INFO] [stage2.py:112:__init__] Allgather bucket size 500000000 [2021-11-29 04:34:30,703] [INFO] [stage2.py:113:__init__] CPU Offload: False [2021-11-29 04:34:30,703] [INFO] [stage2.py:114:__init__] Round robin gradient partitioning: False Rank: 35 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 42 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 47 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 44 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 27 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 23 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 40 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 32 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 36 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 5 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 7 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 54 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 56 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 24 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 37 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 51 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 55 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 20 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 11 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 8 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 58 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 18 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 33 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 50 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 31 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 52 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 63 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 30 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 62 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 19 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 4 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 45 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 41 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 49 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 34 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 48 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 38 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 46 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 29 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 57 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 61 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 12 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 53 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 39 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 43 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 25 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 60 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 21 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 16 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 17 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 6 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 9 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 28 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 22 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 26 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 15 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 59 partition count [32, 32] and sizes[(22224896, False), (10112, False)] Rank: 1 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 10 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 0 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 14 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 13 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 3 partition count [32, 32] and sizes[(22224896, False), (9984, False)] Rank: 2 partition count [32, 32] and sizes[(22224896, False), (9984, False)] [2021-11-29 04:34:34,019] [INFO] [utils.py:806:see_memory_usage] Before initializing optimizer states [2021-11-29 04:34:34,019] [INFO] [utils.py:807:see_memory_usage] MA 1.41 GB Max_MA 1.45 GB CA 2.71 GB Max_CA 3 GB [2021-11-29 04:34:34,020] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.12 GB, percent = 22.5% [2021-11-29 04:34:34,055] [INFO] [utils.py:806:see_memory_usage] After initializing optimizer states [2021-11-29 04:34:34,055] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.66 GB CA 2.96 GB Max_CA 3 GB [2021-11-29 04:34:34,056] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.23 GB, percent = 22.6% [2021-11-29 04:34:34,056] [INFO] [stage2.py:474:__init__] optimizer state initialized [2021-11-29 04:34:34,089] [INFO] [utils.py:806:see_memory_usage] After initializing ZeRO optimizer [2021-11-29 04:34:34,089] [INFO] [utils.py:807:see_memory_usage] MA 1.57 GB Max_MA 1.57 GB CA 2.96 GB Max_CA 3 GB [2021-11-29 04:34:34,090] [INFO] [utils.py:815:see_memory_usage] CPU Virtual Memory: used = 42.31 GB, percent = 22.6% [2021-11-29 04:34:34,090] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed Final Optimizer = FusedAdam [2021-11-29 04:34:34,090] [INFO] [engine.py:599:_configure_lr_scheduler] DeepSpeed using client LR scheduler [2021-11-29 04:34:34,090] [INFO] [logging.py:68:log_dist] [Rank 0] DeepSpeed LR Scheduler = [2021-11-29 04:34:34,090] [INFO] [logging.py:68:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0, 0.0], mom=[(0.9, 0.999), (0.9, 0.999)] [2021-11-29 04:34:34,090] [INFO] [config.py:940:print] DeepSpeedEngine configuration: [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] activation_checkpointing_config { "partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false } [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True} [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] allreduce_always_fp32 ........ False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] amp_enabled .................. False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] amp_params ................... False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] checkpoint_tag_validation_enabled True [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] checkpoint_tag_validation_fail False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] curriculum_enabled ........... False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] curriculum_params ............ False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] dataloader_drop_last ......... False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] disable_allgather ............ False [2021-11-29 04:34:34,090] [INFO] [config.py:944:print] dump_state ................... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] dynamic_loss_scale_args ...... {'init_scale': 4096, 'scale_window': 500, 'delayed_shift': 2, 'min_scale': 1} [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_enabled ........... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_gas_boundary_resolution 1 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_layer_name ........ bert.encoder.layer [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_layer_num ......... 0 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_max_iter .......... 100 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_stability ......... 1e-06 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_tol ............... 0.01 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] eigenvalue_verbose ........... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] elasticity_enabled ........... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] flops_profiler_config ........ { "enabled": false, "profile_step": 1, "module_depth": -1, "top_modules": 1, "detailed": true, "output_file": null } [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] fp16_enabled ................. True [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] fp16_master_weights_and_gradients False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] fp16_mixed_quantize .......... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] global_rank .................. 0 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] gradient_accumulation_steps .. 16 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] gradient_clipping ............ 1.0 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] gradient_predivide_factor .... 1.0 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] initial_dynamic_scale ........ 4096 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] loss_scale ................... 0 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] memory_breakdown ............. False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] optimizer_legacy_fusion ...... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] optimizer_name ............... None [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] optimizer_params ............. None [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0} [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] pld_enabled .................. False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] pld_params ................... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] prescale_gradients ........... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_change_rate ......... 0.001 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_groups .............. 1 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_offset .............. 1000 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_period .............. 1000 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_rounding ............ 0 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_start_bits .......... 16 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_target_bits ......... 8 [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_training_enabled .... False [2021-11-29 04:34:34,091] [INFO] [config.py:944:print] quantize_type ................ 0 [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] quantize_verbose ............. False [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] scheduler_name ............... None [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] scheduler_params ............. None [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] sparse_attention ............. None [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] sparse_gradients_enabled ..... False [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] steps_per_print .............. 2000 [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] tensorboard_enabled .......... False [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] tensorboard_job_name ......... DeepSpeedJobName [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] tensorboard_output_path ...... [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] train_batch_size ............. 512 [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] train_micro_batch_size_per_gpu 1 [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] use_quantizer_kernel ......... False [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] wall_clock_breakdown ......... False [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] world_size ................... 32 [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] zero_allow_untested_optimizer False [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] zero_config .................. { "stage": 1, "contiguous_gradients": true, "reduce_scatter": true, "reduce_bucket_size": 5.000000e+08, "allgather_partitions": true, "allgather_bucket_size": 5.000000e+08, "overlap_comm": false, "load_from_fp32_weights": true, "elastic_checkpoint": true, "offload_param": null, "offload_optimizer": null, "sub_group_size": 1.000000e+09, "prefetch_bucket_size": 5.000000e+07, "param_persistence_threshold": 1.000000e+05, "max_live_parameters": 1.000000e+09, "max_reuse_distance": 1.000000e+09, "gather_fp16_weights_on_model_save": false, "ignore_unused_parameters": true, "round_robin_gradients": false, "legacy_stage1": false } [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] zero_enabled ................. True [2021-11-29 04:34:34,092] [INFO] [config.py:944:print] zero_optimization_stage ...... 1 [2021-11-29 04:34:34,092] [INFO] [config.py:946:print] json = { "train_micro_batch_size_per_gpu": 1, "train_batch_size": 512, "gradient_clipping": 1.0, "zero_optimization": { "stage": 1 }, "fp16": { "enabled": true, "loss_scale": 0, "loss_scale_window": 500, "hysteresis": 2, "min_loss_scale": 1, "initial_scale_power": 12 }, "steps_per_print": 2.000000e+03, "wall_clock_breakdown": false } [2021-11-29 04:34:34,092] [INFO] [engine.py:93:__init__] CONFIG: micro_batches=16 micro_batch_size=1 [2021-11-29 04:34:34,129] [INFO] [engine.py:151:__init__] RANK=0 STAGE=0 LAYERS=15 [0, 15) STAGE_PARAMS=711516160 (711.516M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) [2021-11-29 04:34:34,129] [INFO] [engine.py:151:__init__] RANK=32 STAGE=1 LAYERS=17 [15, 32) STAGE_PARAMS=711520256 (711.520M) TOTAL_PARAMS=1423036416 (1423.036M) UNIQUE_PARAMS=1315819520 (1315.820M) > using checkpoint value 0.0002 for learning rate > using checkpoint value 1e-05 for minimum learning rate > using checkpoint value 183105 for warmup iterations > using checkpoint value 73242187 for total number of iterations > using checkpoint value cosine for decay style successfully loaded 32 ZeRO state_dicts for rank 46 successfully loaded 32 ZeRO state_dicts for rank 34 successfully loaded 32 ZeRO state_dicts for rank 59successfully loaded 32 ZeRO state_dicts for rank 57 successfully loaded 32 ZeRO state_dicts for rank 35 successfully loaded 32 ZeRO state_dicts for rank 47 successfully loaded 32 ZeRO state_dicts for rank 56 successfully loaded 32 ZeRO state_dicts for rank 37 successfully loaded 32 ZeRO state_dicts for rank 58 successfully loaded 32 ZeRO state_dicts for rank 44 successfully loaded 32 ZeRO state_dicts for rank 52 successfully loaded 32 ZeRO state_dicts for rank 45 successfully loaded 32 ZeRO state_dicts for rank 49successfully loaded 32 ZeRO state_dicts for rank 50 successfully loaded 32 ZeRO state_dicts for rank 32 successfully loaded 32 ZeRO state_dicts for rank 54 successfully loaded 32 ZeRO state_dicts for rank 48 successfully loaded 32 ZeRO state_dicts for rank 51 successfully loaded 32 ZeRO state_dicts for rank 61 successfully loaded 32 ZeRO state_dicts for rank 55 successfully loaded 32 ZeRO state_dicts for rank 38successfully loaded 32 ZeRO state_dicts for rank 39 successfully loaded 32 ZeRO state_dicts for rank 53 successfully loaded 32 ZeRO state_dicts for rank 62 successfully loaded 32 ZeRO state_dicts for rank 36 successfully loaded 32 ZeRO state_dicts for rank 63 successfully loaded 32 ZeRO state_dicts for rank 60 successfully loaded 32 ZeRO state_dicts for rank 30 successfully loaded 32 ZeRO state_dicts for rank 28successfully loaded 32 ZeRO state_dicts for rank 29 successfully loaded 32 ZeRO state_dicts for rank 20 successfully loaded 32 ZeRO state_dicts for rank 17 successfully loaded 32 ZeRO state_dicts for rank 31 successfully loaded 32 ZeRO state_dicts for rank 22 successfully loaded 32 ZeRO state_dicts for rank 26successfully loaded 32 ZeRO state_dicts for rank 27 successfully loaded 32 ZeRO state_dicts for rank 21 successfully loaded 32 ZeRO state_dicts for rank 23 successfully loaded 32 ZeRO state_dicts for rank 19 successfully loaded 32 ZeRO state_dicts for rank 16 successfully loaded 32 ZeRO state_dicts for rank 25 successfully loaded 32 ZeRO state_dicts for rank 24 successfully loaded 32 ZeRO state_dicts for rank 5successfully loaded 32 ZeRO state_dicts for rank 7 successfully loaded 32 ZeRO state_dicts for rank 6successfully loaded 32 ZeRO state_dicts for rank 4 successfully loaded 32 ZeRO state_dicts for rank 9successfully loaded 32 ZeRO state_dicts for rank 10 successfully loaded 32 ZeRO state_dicts for rank 11 successfully loaded 32 ZeRO state_dicts for rank 8 successfully loaded 32 ZeRO state_dicts for rank 12successfully loaded 32 ZeRO state_dicts for rank 15 successfully loaded 32 ZeRO state_dicts for rank 3 successfully loaded 32 ZeRO state_dicts for rank 33 successfully loaded 32 ZeRO state_dicts for rank 18 successfully loaded 32 ZeRO state_dicts for rank 2 successfully loaded 32 ZeRO state_dicts for rank 43 successfully loaded 32 ZeRO state_dicts for rank 42 successfully loaded 32 ZeRO state_dicts for rank 40 successfully loaded 32 ZeRO state_dicts for rank 41 successfully loaded 32 ZeRO state_dicts for rank 13 successfully loaded 32 ZeRO state_dicts for rank 14 loading 32 zero partition checkpoints for rank 54 loading 32 zero partition checkpoints for rank 56 loading 32 zero partition checkpoints for rank 44 loading 32 zero partition checkpoints for rank 32 loading 32 zero partition checkpoints for rank 34 loading 32 zero partition checkpoints for rank 45 loading 32 zero partition checkpoints for rank 52 loading 32 zero partition checkpoints for rank 35 loading 32 zero partition checkpoints for rank 63 loading 32 zero partition checkpoints for rank 55 loading 32 zero partition checkpoints for rank 49 loading 32 zero partition checkpoints for rank 57 loading 32 zero partition checkpoints for rank 48 loading 32 zero partition checkpoints for rank 50 loading 32 zero partition checkpoints for rank 36 loading 32 zero partition checkpoints for rank 58 loading 32 zero partition checkpoints for rank 61 loading 32 zero partition checkpoints for rank 51 loading 32 zero partition checkpoints for rank 53 loading 32 zero partition checkpoints for rank 47 loading 32 zero partition checkpoints for rank 59 loading 32 zero partition checkpoints for rank 62 loading 32 zero partition checkpoints for rank 39 loading 32 zero partition checkpoints for rank 19 loading 32 zero partition checkpoints for rank 38 loading 32 zero partition checkpoints for rank 31 loading 32 zero partition checkpoints for rank 60 loading 32 zero partition checkpoints for rank 26 loading 32 zero partition checkpoints for rank 17 loading 32 zero partition checkpoints for rank 16 loading 32 zero partition checkpoints for rank 28 loading 32 zero partition checkpoints for rank 46 loading 32 zero partition checkpoints for rank 30 loading 32 zero partition checkpoints for rank 20 loading 32 zero partition checkpoints for rank 7 loading 32 zero partition checkpoints for rank 5 loading 32 zero partition checkpoints for rank 25 loading 32 zero partition checkpoints for rank 29 loading 32 zero partition checkpoints for rank 22 loading 32 zero partition checkpoints for rank 27 loading 32 zero partition checkpoints for rank 24 loading 32 zero partition checkpoints for rank 23 loading 32 zero partition checkpoints for rank 8 loading 32 zero partition checkpoints for rank 21 loading 32 zero partition checkpoints for rank 37 loading 32 zero partition checkpoints for rank 33 loading 32 zero partition checkpoints for rank 18 loading 32 zero partition checkpoints for rank 10 loading 32 zero partition checkpoints for rank 6 loading 32 zero partition checkpoints for rank 11 loading 32 zero partition checkpoints for rank 4 loading 32 zero partition checkpoints for rank 9 loading 32 zero partition checkpoints for rank 40 loading 32 zero partition checkpoints for rank 41 successfully loaded 32 ZeRO state_dicts for rank 0 loading 32 zero partition checkpoints for rank 42 loading 32 zero partition checkpoints for rank 43 successfully loaded 32 ZeRO state_dicts for rank 1 loading 32 zero partition checkpoints for rank 15 loading 32 zero partition checkpoints for rank 12 loading 32 zero partition checkpoints for rank 3 loading 32 zero partition checkpoints for rank 2 loading 32 zero partition checkpoints for rank 13 loading 32 zero partition checkpoints for rank 14 loading 32 zero partition checkpoints for rank 0 checkpoint version 3.0 loading 32 zero partition checkpoints for rank 1 successfully loaded checkpoint from /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints at iteration 138753 time (ms) | load-checkpoint: 16215.03 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters: 1.42303232 /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") /gpfsssd/scratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/code/Megatron-DeepSpeed/megatron/utils.py:277: UserWarning: Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings warnings.warn("Parameter count with the embeddings will be inaccurate with PP > 1, as the first and last stage hold several copies of the embeddings") estimated model parameters without embeddings: 1.208598528 [after model, optimizer, and learning rate scheduler are built] datetime: 2021-11-29 04:34:50 > building train, validation, and test datasets ... > datasets target sizes (minimum size): train: 73242187 validation: 7833600 test: 51200 > building train, validation, and test datasets for GPT ... > building dataset index ... reading sizes... reading pointers... reading document index... creating numpy buffer of mmap... creating memory view of numpy buffer... > finished creating indexed dataset in 0.037685 seconds number of documents: 304230423 > dataset split: train: document indices in [0, 288714672) total of 288714672 documents validation: document indices in [288714672, 303926193) total of 15211521 documents test: document indices in [303926193, 304230423) total of 304230 documents > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_train_indexmap_73242187ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.110 seconds total number of samples: 131537224 total number of epochs: 1 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_valid_indexmap_7833600ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.131 seconds total number of samples: 13854322 total number of epochs: 2 > loading doc-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_doc_idx.npy > loading sample-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_sample_idx.npy > loading shuffle-idx mapping from /gpfswork/rech/six/commun/datasets-custom/oscar-en/meg-gpt2_text_document_test_indexmap_51200ns_2048sl_1234s_shuffle_idx.npy loaded indexed file in 0.072 seconds total number of samples: 137384 total number of epochs: 1 > finished creating GPT datasets ... [after dataloaders are built] datetime: 2021-11-29 04:34:59 done with setup ... training ... time (ms) | model-and-optimizer-setup: 21905.36 | train/valid/test-data-iterators-setup: 8380.23 Number of parameters: 1.42303232 billion Number of parameters: 1.423040512 billion Number of parameters without embeddings: 1.20860672 billion Number of parameters without embeddings: 1.208598528 billion [before the start of training step] datetime: 2021-11-29 04:34:59 [2021-11-29 04:34:59,167] [INFO] [checkpointing.py:547:forward] Activation Checkpointing Information [2021-11-29 04:34:59,167] [INFO] [checkpointing.py:548:forward] ----Partition Activations False, CPU CHECKPOINTING False [2021-11-29 04:34:59,167] [INFO] [checkpointing.py:551:forward] ----contiguous Memory Checkpointing False with 24 total layers [2021-11-29 04:34:59,167] [INFO] [checkpointing.py:554:forward] ----Synchronization False [2021-11-29 04:34:59,167] [INFO] [checkpointing.py:555:forward] ----Profiling time in checkpointing False /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: /gpfsdswork/projects/rech/six/uty16tp/code/big_science/DeepSpeed-Official/deepspeed/runtime/pipe/engine.py:1169: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations. if t.grad is not None: [Rank 0] (after 138800 iterations) memory (MB) | allocated: 1631.6650390625 | max allocated: 3929.2744140625 | reserved: 6816.0 | max reserved: 6816.0 [Rank 32] (after 138800 iterations) memory (MB) | allocated: 2443.63623046875 | max allocated: 4725.25341796875 | reserved: 7900.0 | max reserved: 7900.0 iteration 138800/ 152972 | consumed samples: 65985984 | consumed tokens: 135139295232 | elapsed time per iteration (ms): 4739.0 | learning rate: 1.477E-05 | global batch size: 512 | lm loss: 1.380738E+00 | loss scale: 262144.0 | grad norm: 32798.023 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 139000/ 152972 | consumed samples: 66088384 | consumed tokens: 135349010432 | elapsed time per iteration (ms): 4683.8 | learning rate: 1.464E-05 | global batch size: 512 | lm loss: 1.401021E+00 | loss scale: 262144.0 | grad norm: 31191.474 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 139000 | lm loss value: 1.367113E+00 | lm loss PPL: 3.924004E+00 | -------------------------------------------------------------------------------------------- iteration 139200/ 152972 | consumed samples: 66190784 | consumed tokens: 135558725632 | elapsed time per iteration (ms): 5219.9 | learning rate: 1.451E-05 | global batch size: 512 | lm loss: 1.391717E+00 | loss scale: 131072.0 | grad norm: 20776.979 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 139400/ 152972 | consumed samples: 66293184 | consumed tokens: 135768440832 | elapsed time per iteration (ms): 4687.5 | learning rate: 1.438E-05 | global batch size: 512 | lm loss: 1.408783E+00 | loss scale: 131072.0 | grad norm: 12412.247 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 139500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 05:35:09,172] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/mp_rank_00_model_states.pt [2021-11-29 05:35:09,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,612] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,613] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,615] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,616] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,618] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,621] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,623] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,625] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 05:35:09,675] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 05:35:09,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step139500/zero_pp_rank_26_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 139500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2665.70 iteration 139600/ 152972 | consumed samples: 66395584 | consumed tokens: 135978156032 | elapsed time per iteration (ms): 4682.5 | learning rate: 1.426E-05 | global batch size: 512 | lm loss: 1.407327E+00 | loss scale: 131072.0 | grad norm: 14588.093 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 139800/ 152972 | consumed samples: 66497984 | consumed tokens: 136187871232 | elapsed time per iteration (ms): 4681.9 | learning rate: 1.414E-05 | global batch size: 512 | lm loss: 1.378247E+00 | loss scale: 262144.0 | grad norm: 15791.286 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 06:14:09,824] [INFO] [logging.py:68:log_dist] [Rank 0] step=140000, skipped=299, lr=[1.4015535236975445e-05, 1.4015535236975445e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 140000 loss: 1.5874 iter time (s): 0.002 samples/sec: 217971.078 iteration 140000/ 152972 | consumed samples: 66600384 | consumed tokens: 136397586432 | elapsed time per iteration (ms): 4685.3 | learning rate: 1.402E-05 | global batch size: 512 | lm loss: 1.422707E+00 | loss scale: 131072.0 | grad norm: 16802.662 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 140000 | lm loss value: 1.443653E+00 | lm loss PPL: 4.236142E+00 | -------------------------------------------------------------------------------------------- iteration 140200/ 152972 | consumed samples: 66702784 | consumed tokens: 136607301632 | elapsed time per iteration (ms): 5222.4 | learning rate: 1.390E-05 | global batch size: 512 | lm loss: 1.440071E+00 | loss scale: 131072.0 | grad norm: 16287.973 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 140400/ 152972 | consumed samples: 66805184 | consumed tokens: 136817016832 | elapsed time per iteration (ms): 4664.1 | learning rate: 1.378E-05 | global batch size: 512 | lm loss: 1.455977E+00 | loss scale: 262144.0 | grad norm: 12942.489 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 140600/ 152972 | consumed samples: 66907584 | consumed tokens: 137026732032 | elapsed time per iteration (ms): 4691.0 | learning rate: 1.366E-05 | global batch size: 512 | lm loss: 1.456851E+00 | loss scale: 262144.0 | grad norm: 35407.659 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 140800/ 152972 | consumed samples: 67009984 | consumed tokens: 137236447232 | elapsed time per iteration (ms): 4686.1 | learning rate: 1.355E-05 | global batch size: 512 | lm loss: 1.361158E+00 | loss scale: 262144.0 | grad norm: 37273.868 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 141000/ 152972 | consumed samples: 67112384 | consumed tokens: 137446162432 | elapsed time per iteration (ms): 4685.1 | learning rate: 1.344E-05 | global batch size: 512 | lm loss: 1.433967E+00 | loss scale: 131072.0 | grad norm: 15144.780 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 141000 | lm loss value: 1.509623E+00 | lm loss PPL: 4.525025E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 141000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 07:35:49,942] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/mp_rank_00_model_states.pt [2021-11-29 07:35:50,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,368] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,378] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,379] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,381] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,383] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,384] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,385] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,387] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,388] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,389] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,392] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,394] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,395] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,397] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,398] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,399] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,401] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,402] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,406] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,411] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,412] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,413] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,415] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,416] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,417] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,418] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,421] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,422] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,423] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,424] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,426] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,429] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,431] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,434] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,435] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 07:35:50,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,437] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,440] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,441] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,442] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 07:35:50,449] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step141000/zero_pp_rank_13_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 141000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2912.62 iteration 141200/ 152972 | consumed samples: 67214784 | consumed tokens: 137655877632 | elapsed time per iteration (ms): 5256.2 | learning rate: 1.333E-05 | global batch size: 512 | lm loss: 1.408207E+00 | loss scale: 131072.0 | grad norm: 20844.793 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 141400/ 152972 | consumed samples: 67317184 | consumed tokens: 137865592832 | elapsed time per iteration (ms): 4688.6 | learning rate: 1.322E-05 | global batch size: 512 | lm loss: 1.445639E+00 | loss scale: 131072.0 | grad norm: 18132.249 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 141600/ 152972 | consumed samples: 67419584 | consumed tokens: 138075308032 | elapsed time per iteration (ms): 4676.3 | learning rate: 1.311E-05 | global batch size: 512 | lm loss: 1.426917E+00 | loss scale: 131072.0 | grad norm: 18925.706 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 141800/ 152972 | consumed samples: 67521984 | consumed tokens: 138285023232 | elapsed time per iteration (ms): 4694.4 | learning rate: 1.301E-05 | global batch size: 512 | lm loss: 1.443178E+00 | loss scale: 131072.0 | grad norm: 17953.186 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 08:54:01,854] [INFO] [logging.py:68:log_dist] [Rank 0] step=142000, skipped=303, lr=[1.2902833255571082e-05, 1.2902833255571082e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 142000 loss: 1.1663 iter time (s): 0.002 samples/sec: 219989.651 iteration 142000/ 152972 | consumed samples: 67624384 | consumed tokens: 138494738432 | elapsed time per iteration (ms): 4695.9 | learning rate: 1.290E-05 | global batch size: 512 | lm loss: 1.487170E+00 | loss scale: 131072.0 | grad norm: 14901.369 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 142000 | lm loss value: 1.372343E+00 | lm loss PPL: 3.944581E+00 | -------------------------------------------------------------------------------------------- iteration 142200/ 152972 | consumed samples: 67726784 | consumed tokens: 138704453632 | elapsed time per iteration (ms): 5210.3 | learning rate: 1.280E-05 | global batch size: 512 | lm loss: 1.363821E+00 | loss scale: 65536.0 | grad norm: 5146.707 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 142400/ 152972 | consumed samples: 67829184 | consumed tokens: 138914168832 | elapsed time per iteration (ms): 4665.9 | learning rate: 1.270E-05 | global batch size: 512 | lm loss: 1.424317E+00 | loss scale: 65536.0 | grad norm: 7706.712 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 142500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 09:34:45,648] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/mp_rank_00_model_states.pt [2021-11-29 09:34:46,071] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,076] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,078] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,080] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,081] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,082] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,083] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,083] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,084] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,085] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,086] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,087] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,087] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,090] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,092] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,094] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,095] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,097] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,097] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,098] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,103] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,105] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,105] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,110] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,111] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,112] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,112] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,112] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,113] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,114] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,115] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,116] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,117] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,119] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,120] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,121] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,122] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,123] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,124] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,126] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,128] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,128] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,129] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,132] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,132] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,136] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,137] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 09:34:46,143] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 09:34:46,151] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step142500/zero_pp_rank_12_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 142500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2691.68 iteration 142600/ 152972 | consumed samples: 67931584 | consumed tokens: 139123884032 | elapsed time per iteration (ms): 4681.6 | learning rate: 1.260E-05 | global batch size: 512 | lm loss: 1.406725E+00 | loss scale: 131072.0 | grad norm: 15934.558 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 142800/ 152972 | consumed samples: 68033984 | consumed tokens: 139333599232 | elapsed time per iteration (ms): 4696.8 | learning rate: 1.251E-05 | global batch size: 512 | lm loss: 1.431062E+00 | loss scale: 131072.0 | grad norm: 13099.721 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 143000/ 152972 | consumed samples: 68136384 | consumed tokens: 139543314432 | elapsed time per iteration (ms): 4693.2 | learning rate: 1.241E-05 | global batch size: 512 | lm loss: 1.450073E+00 | loss scale: 131072.0 | grad norm: 22963.393 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 143000 | lm loss value: 1.403675E+00 | lm loss PPL: 4.070130E+00 | -------------------------------------------------------------------------------------------- iteration 143200/ 152972 | consumed samples: 68238784 | consumed tokens: 139753029632 | elapsed time per iteration (ms): 5241.3 | learning rate: 1.232E-05 | global batch size: 512 | lm loss: 1.367809E+00 | loss scale: 262144.0 | grad norm: 20878.635 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 143400/ 152972 | consumed samples: 68341184 | consumed tokens: 139962744832 | elapsed time per iteration (ms): 4665.4 | learning rate: 1.223E-05 | global batch size: 512 | lm loss: 1.420061E+00 | loss scale: 65536.0 | grad norm: 12323.660 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 143600/ 152972 | consumed samples: 68443584 | consumed tokens: 140172460032 | elapsed time per iteration (ms): 4680.6 | learning rate: 1.214E-05 | global batch size: 512 | lm loss: 1.343801E+00 | loss scale: 65536.0 | grad norm: 10339.748 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 143800/ 152972 | consumed samples: 68545984 | consumed tokens: 140382175232 | elapsed time per iteration (ms): 4683.2 | learning rate: 1.205E-05 | global batch size: 512 | lm loss: 1.419000E+00 | loss scale: 65536.0 | grad norm: 7275.376 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 11:33:41,599] [INFO] [logging.py:68:log_dist] [Rank 0] step=144000, skipped=307, lr=[1.1967954587502549e-05, 1.1967954587502549e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 144000 loss: 0.8004 iter time (s): 0.002 samples/sec: 219367.062 iteration 144000/ 152972 | consumed samples: 68648384 | consumed tokens: 140591890432 | elapsed time per iteration (ms): 4680.4 | learning rate: 1.197E-05 | global batch size: 512 | lm loss: 1.397540E+00 | loss scale: 131072.0 | grad norm: 11161.323 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 144000 | lm loss value: 1.412039E+00 | lm loss PPL: 4.104315E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 144000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 11:35:33,265] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/mp_rank_00_model_states.pt [2021-11-29 11:35:33,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,697] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,702] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,705] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,708] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,712] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,718] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,722] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,729] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,735] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,737] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,738] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,740] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,742] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,743] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,743] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,744] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,750] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 11:35:33,769] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 11:35:33,772] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step144000/zero_pp_rank_12_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 144000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2585.27 iteration 144200/ 152972 | consumed samples: 68750784 | consumed tokens: 140801605632 | elapsed time per iteration (ms): 5230.2 | learning rate: 1.188E-05 | global batch size: 512 | lm loss: 1.428031E+00 | loss scale: 131072.0 | grad norm: 10353.743 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 144400/ 152972 | consumed samples: 68853184 | consumed tokens: 141011320832 | elapsed time per iteration (ms): 4710.8 | learning rate: 1.180E-05 | global batch size: 512 | lm loss: 1.448700E+00 | loss scale: 262144.0 | grad norm: 34877.252 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 144600/ 152972 | consumed samples: 68955584 | consumed tokens: 141221036032 | elapsed time per iteration (ms): 4730.3 | learning rate: 1.172E-05 | global batch size: 512 | lm loss: 1.401215E+00 | loss scale: 262144.0 | grad norm: 31104.603 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 144800/ 152972 | consumed samples: 69057984 | consumed tokens: 141430751232 | elapsed time per iteration (ms): 4734.7 | learning rate: 1.164E-05 | global batch size: 512 | lm loss: 1.377413E+00 | loss scale: 262144.0 | grad norm: 31550.265 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 145000/ 152972 | consumed samples: 69160384 | consumed tokens: 141640466432 | elapsed time per iteration (ms): 4714.2 | learning rate: 1.157E-05 | global batch size: 512 | lm loss: 1.460189E+00 | loss scale: 131072.0 | grad norm: 20837.968 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 145000 | lm loss value: 1.447380E+00 | lm loss PPL: 4.251959E+00 | -------------------------------------------------------------------------------------------- iteration 145200/ 152972 | consumed samples: 69262784 | consumed tokens: 141850181632 | elapsed time per iteration (ms): 5226.7 | learning rate: 1.149E-05 | global batch size: 512 | lm loss: 1.482966E+00 | loss scale: 131072.0 | grad norm: 17193.512 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 145400/ 152972 | consumed samples: 69365184 | consumed tokens: 142059896832 | elapsed time per iteration (ms): 4679.7 | learning rate: 1.142E-05 | global batch size: 512 | lm loss: 1.343832E+00 | loss scale: 65536.0 | grad norm: 5353.805 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 145500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 13:34:56,832] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/mp_rank_00_model_states.pt [2021-11-29 13:34:57,259] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,264] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,261] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,263] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,266] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,268] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,271] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,265] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,267] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,270] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,273] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,281] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,269] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,285] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,288] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,276] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,274] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,272] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,289] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,277] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,278] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,292] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,295] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,298] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,300] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,302] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,303] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,304] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,306] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,307] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,309] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,310] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,311] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,314] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,315] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,319] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,320] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,324] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,329] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,337] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,339] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,341] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 13:34:57,344] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,377] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,409] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 13:34:57,436] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step145500/zero_pp_rank_6_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 145500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2718.80 iteration 145600/ 152972 | consumed samples: 69467584 | consumed tokens: 142269612032 | elapsed time per iteration (ms): 4696.6 | learning rate: 1.135E-05 | global batch size: 512 | lm loss: 1.446489E+00 | loss scale: 65536.0 | grad norm: 9151.391 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 145800/ 152972 | consumed samples: 69569984 | consumed tokens: 142479327232 | elapsed time per iteration (ms): 4686.1 | learning rate: 1.128E-05 | global batch size: 512 | lm loss: 1.391850E+00 | loss scale: 131072.0 | grad norm: 17636.599 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 14:13:59,732] [INFO] [logging.py:68:log_dist] [Rank 0] step=146000, skipped=310, lr=[1.1212371213869069e-05, 1.1212371213869069e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 146000 loss: 1.7907 iter time (s): 0.002 samples/sec: 220620.454 iteration 146000/ 152972 | consumed samples: 69672384 | consumed tokens: 142689042432 | elapsed time per iteration (ms): 4681.4 | learning rate: 1.121E-05 | global batch size: 512 | lm loss: 1.441876E+00 | loss scale: 131072.0 | grad norm: 18971.673 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 146000 | lm loss value: 1.365758E+00 | lm loss PPL: 3.918691E+00 | -------------------------------------------------------------------------------------------- iteration 146200/ 152972 | consumed samples: 69774784 | consumed tokens: 142898757632 | elapsed time per iteration (ms): 5208.3 | learning rate: 1.115E-05 | global batch size: 512 | lm loss: 1.420899E+00 | loss scale: 131072.0 | grad norm: 16737.377 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 146400/ 152972 | consumed samples: 69877184 | consumed tokens: 143108472832 | elapsed time per iteration (ms): 4668.5 | learning rate: 1.108E-05 | global batch size: 512 | lm loss: 1.412415E+00 | loss scale: 262144.0 | grad norm: 45393.801 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 146600/ 152972 | consumed samples: 69979584 | consumed tokens: 143318188032 | elapsed time per iteration (ms): 4671.0 | learning rate: 1.102E-05 | global batch size: 512 | lm loss: 1.452185E+00 | loss scale: 131072.0 | grad norm: 14142.359 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 146800/ 152972 | consumed samples: 70081984 | consumed tokens: 143527903232 | elapsed time per iteration (ms): 4669.2 | learning rate: 1.096E-05 | global batch size: 512 | lm loss: 1.412156E+00 | loss scale: 65536.0 | grad norm: 9272.190 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 147000/ 152972 | consumed samples: 70184384 | consumed tokens: 143737618432 | elapsed time per iteration (ms): 4695.4 | learning rate: 1.090E-05 | global batch size: 512 | lm loss: 1.405385E+00 | loss scale: 65536.0 | grad norm: 8661.162 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 147000 | lm loss value: 1.459220E+00 | lm loss PPL: 4.302600E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 147000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 15:35:32,149] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/mp_rank_00_model_states.pt [2021-11-29 15:35:32,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,592] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,593] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,595] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,597] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,599] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,600] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,603] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,605] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,607] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,608] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,609] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,610] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,611] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,614] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,622] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,627] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,630] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 15:35:32,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 15:35:32,763] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 15:35:33,106] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step147000/zero_pp_rank_13_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 147000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2811.35 iteration 147200/ 152972 | consumed samples: 70286784 | consumed tokens: 143947333632 | elapsed time per iteration (ms): 5225.9 | learning rate: 1.085E-05 | global batch size: 512 | lm loss: 1.426454E+00 | loss scale: 131072.0 | grad norm: 20658.135 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 147400/ 152972 | consumed samples: 70389184 | consumed tokens: 144157048832 | elapsed time per iteration (ms): 4690.6 | learning rate: 1.079E-05 | global batch size: 512 | lm loss: 1.445200E+00 | loss scale: 131072.0 | grad norm: 13661.808 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 147600/ 152972 | consumed samples: 70491584 | consumed tokens: 144366764032 | elapsed time per iteration (ms): 4700.9 | learning rate: 1.074E-05 | global batch size: 512 | lm loss: 1.410879E+00 | loss scale: 131072.0 | grad norm: 21026.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 147800/ 152972 | consumed samples: 70593984 | consumed tokens: 144576479232 | elapsed time per iteration (ms): 4678.3 | learning rate: 1.069E-05 | global batch size: 512 | lm loss: 1.422960E+00 | loss scale: 131072.0 | grad norm: 18683.901 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 16:53:37,176] [INFO] [logging.py:68:log_dist] [Rank 0] step=148000, skipped=315, lr=[1.0638540701109412e-05, 1.0638540701109412e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 148000 loss: 1.6391 iter time (s): 0.002 samples/sec: 219487.214 iteration 148000/ 152972 | consumed samples: 70696384 | consumed tokens: 144786194432 | elapsed time per iteration (ms): 4679.1 | learning rate: 1.064E-05 | global batch size: 512 | lm loss: 1.386961E+00 | loss scale: 131072.0 | grad norm: 18930.575 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 148000 | lm loss value: 1.364979E+00 | lm loss PPL: 3.915640E+00 | -------------------------------------------------------------------------------------------- iteration 148200/ 152972 | consumed samples: 70798784 | consumed tokens: 144995909632 | elapsed time per iteration (ms): 5212.8 | learning rate: 1.059E-05 | global batch size: 512 | lm loss: 1.390863E+00 | loss scale: 131072.0 | grad norm: 11861.990 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 148400/ 152972 | consumed samples: 70901184 | consumed tokens: 145205624832 | elapsed time per iteration (ms): 4680.4 | learning rate: 1.055E-05 | global batch size: 512 | lm loss: 1.388840E+00 | loss scale: 65536.0 | grad norm: 7873.504 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 148500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 17:34:24,202] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/mp_rank_00_model_states.pt [2021-11-29 17:34:24,628] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,631] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,635] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,636] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,637] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,640] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,642] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,643] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,648] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,649] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,650] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,662] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,663] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,664] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,665] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,667] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,670] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,671] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,675] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,675] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 17:34:24,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 17:34:24,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step148500/zero_pp_rank_15_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 148500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2766.15 iteration 148600/ 152972 | consumed samples: 71003584 | consumed tokens: 145415340032 | elapsed time per iteration (ms): 4675.5 | learning rate: 1.050E-05 | global batch size: 512 | lm loss: 1.421551E+00 | loss scale: 65536.0 | grad norm: 9212.564 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 148800/ 152972 | consumed samples: 71105984 | consumed tokens: 145625055232 | elapsed time per iteration (ms): 4685.0 | learning rate: 1.046E-05 | global batch size: 512 | lm loss: 1.447534E+00 | loss scale: 131072.0 | grad norm: 12977.041 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 149000/ 152972 | consumed samples: 71208384 | consumed tokens: 145834770432 | elapsed time per iteration (ms): 4682.5 | learning rate: 1.042E-05 | global batch size: 512 | lm loss: 1.400846E+00 | loss scale: 131072.0 | grad norm: 14722.048 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 149000 | lm loss value: 1.379523E+00 | lm loss PPL: 3.973007E+00 | -------------------------------------------------------------------------------------------- iteration 149200/ 152972 | consumed samples: 71310784 | consumed tokens: 146044485632 | elapsed time per iteration (ms): 5245.3 | learning rate: 1.038E-05 | global batch size: 512 | lm loss: 1.474322E+00 | loss scale: 131072.0 | grad norm: 15650.875 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 149400/ 152972 | consumed samples: 71413184 | consumed tokens: 146254200832 | elapsed time per iteration (ms): 4676.0 | learning rate: 1.034E-05 | global batch size: 512 | lm loss: 1.448429E+00 | loss scale: 131072.0 | grad norm: 11595.219 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 149600/ 152972 | consumed samples: 71515584 | consumed tokens: 146463916032 | elapsed time per iteration (ms): 4684.5 | learning rate: 1.031E-05 | global batch size: 512 | lm loss: 1.427059E+00 | loss scale: 262144.0 | grad norm: 29698.294 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 149800/ 152972 | consumed samples: 71617984 | consumed tokens: 146673631232 | elapsed time per iteration (ms): 4672.7 | learning rate: 1.028E-05 | global batch size: 512 | lm loss: 1.433222E+00 | loss scale: 262144.0 | grad norm: 28746.470 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 19:33:16,563] [INFO] [logging.py:68:log_dist] [Rank 0] step=150000, skipped=319, lr=[1.0246572345484613e-05, 1.0246572345484613e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 150000 loss: 1.2654 iter time (s): 0.002 samples/sec: 219014.538 iteration 150000/ 152972 | consumed samples: 71720384 | consumed tokens: 146883346432 | elapsed time per iteration (ms): 4682.4 | learning rate: 1.025E-05 | global batch size: 512 | lm loss: 1.401401E+00 | loss scale: 131072.0 | grad norm: 15368.130 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 150000 | lm loss value: 1.380250E+00 | lm loss PPL: 3.975897E+00 | -------------------------------------------------------------------------------------------- saving checkpoint at iteration 150000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 19:35:06,996] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/mp_rank_00_model_states.pt [2021-11-29 19:35:07,468] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,474] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,492] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,495] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,498] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,499] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,534] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,537] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,538] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,542] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,543] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,545] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,546] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,547] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,549] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,550] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,552] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,552] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,553] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,554] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,555] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,556] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,560] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,561] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,562] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,563] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,564] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,569] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,570] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,574] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,575] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,576] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,577] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,578] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,579] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,582] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,584] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,585] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,586] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,587] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,590] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,594] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,596] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 19:35:07,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,598] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 19:35:07,606] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step150000/zero_pp_rank_20_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 150000 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2683.07 iteration 150200/ 152972 | consumed samples: 71822784 | consumed tokens: 147093061632 | elapsed time per iteration (ms): 5253.7 | learning rate: 1.022E-05 | global batch size: 512 | lm loss: 1.383119E+00 | loss scale: 131072.0 | grad norm: 17195.063 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 150400/ 152972 | consumed samples: 71925184 | consumed tokens: 147302776832 | elapsed time per iteration (ms): 4688.9 | learning rate: 1.019E-05 | global batch size: 512 | lm loss: 1.452179E+00 | loss scale: 131072.0 | grad norm: 18677.661 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 150600/ 152972 | consumed samples: 72027584 | consumed tokens: 147512492032 | elapsed time per iteration (ms): 4689.5 | learning rate: 1.016E-05 | global batch size: 512 | lm loss: 1.395660E+00 | loss scale: 262144.0 | grad norm: 36657.325 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 150800/ 152972 | consumed samples: 72129984 | consumed tokens: 147722207232 | elapsed time per iteration (ms): 4696.2 | learning rate: 1.014E-05 | global batch size: 512 | lm loss: 1.389283E+00 | loss scale: 262144.0 | grad norm: 36867.878 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 151000/ 152972 | consumed samples: 72232384 | consumed tokens: 147931922432 | elapsed time per iteration (ms): 4681.7 | learning rate: 1.012E-05 | global batch size: 512 | lm loss: 1.454095E+00 | loss scale: 131072.0 | grad norm: 25976.230 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 151000 | lm loss value: 1.331673E+00 | lm loss PPL: 3.787373E+00 | -------------------------------------------------------------------------------------------- iteration 151200/ 152972 | consumed samples: 72334784 | consumed tokens: 148141637632 | elapsed time per iteration (ms): 5232.6 | learning rate: 1.010E-05 | global batch size: 512 | lm loss: 1.360068E+00 | loss scale: 131072.0 | grad norm: 18892.279 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 151400/ 152972 | consumed samples: 72437184 | consumed tokens: 148351352832 | elapsed time per iteration (ms): 4674.6 | learning rate: 1.008E-05 | global batch size: 512 | lm loss: 1.432375E+00 | loss scale: 131072.0 | grad norm: 8131.446 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | saving checkpoint at iteration 151500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints [2021-11-29 21:34:10,253] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/mp_rank_00_model_states.pt [2021-11-29 21:34:10,677] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,680] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,684] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,690] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,691] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,694] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,699] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,700] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,706] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,707] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,709] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,711] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,713] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,716] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,720] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,722] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,723] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,723] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,724] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,726] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,727] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,728] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,730] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,731] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_0_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,732] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,733] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,734] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,736] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,739] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,741] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,746] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 21:34:10,750] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 21:34:10,754] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step151500/zero_pp_rank_5_mp_rank_01_optim_states.pt successfully saved checkpoint at iteration 151500 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints time (ms) | save-checkpoint: 2854.16 iteration 151600/ 152972 | consumed samples: 72539584 | consumed tokens: 148561068032 | elapsed time per iteration (ms): 4688.5 | learning rate: 1.006E-05 | global batch size: 512 | lm loss: 1.418091E+00 | loss scale: 262144.0 | grad norm: 20242.119 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 151800/ 152972 | consumed samples: 72641984 | consumed tokens: 148770783232 | elapsed time per iteration (ms): 4677.0 | learning rate: 1.005E-05 | global batch size: 512 | lm loss: 1.390659E+00 | loss scale: 262144.0 | grad norm: 27674.141 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [2021-11-29 22:13:10,709] [INFO] [logging.py:68:log_dist] [Rank 0] step=152000, skipped=323, lr=[1.0037556072438103e-05, 1.0037556072438103e-05], mom=[(0.9, 0.999), (0.9, 0.999)] steps: 152000 loss: 2.0827 iter time (s): 0.002 samples/sec: 219112.908 iteration 152000/ 152972 | consumed samples: 72744384 | consumed tokens: 148980498432 | elapsed time per iteration (ms): 4688.0 | learning rate: 1.004E-05 | global batch size: 512 | lm loss: 1.405122E+00 | loss scale: 131072.0 | grad norm: 23723.064 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | -------------------------------------------------------------------------------------------- valid loss at iteration 152000 | lm loss value: 1.404997E+00 | lm loss PPL: 4.075515E+00 | -------------------------------------------------------------------------------------------- iteration 152200/ 152972 | consumed samples: 72846784 | consumed tokens: 149190213632 | elapsed time per iteration (ms): 5220.0 | learning rate: 1.003E-05 | global batch size: 512 | lm loss: 1.398634E+00 | loss scale: 131072.0 | grad norm: 15487.143 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 152400/ 152972 | consumed samples: 72949184 | consumed tokens: 149399928832 | elapsed time per iteration (ms): 4700.0 | learning rate: 1.002E-05 | global batch size: 512 | lm loss: 1.426604E+00 | loss scale: 262144.0 | grad norm: 39846.653 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 152600/ 152972 | consumed samples: 73051584 | consumed tokens: 149609644032 | elapsed time per iteration (ms): 4664.5 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.383711E+00 | loss scale: 262144.0 | grad norm: 23119.214 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | iteration 152800/ 152972 | consumed samples: 73153984 | consumed tokens: 149819359232 | elapsed time per iteration (ms): 4668.8 | learning rate: 1.001E-05 | global batch size: 512 | lm loss: 1.378195E+00 | loss scale: 131072.0 | grad norm: 15325.895 | num zeros: 0.0 | number of skipped iterations: 0 | number of nan iterations: 0 | [after training is done] datetime: 2021-11-29 23:30:49 saving checkpoint at iteration 152972 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints ------------------------------------------------------------------------------------------------------------ valid loss at the end of training for val data | lm loss value: 1.404385E+00 | lm loss PPL: 4.073020E+00 | ------------------------------------------------------------------------------------------------------------ [2021-11-29 23:32:39,212] [INFO] [logging.py:68:log_dist] [Rank 0] Saving model checkpoint: /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/mp_rank_00_model_states.pt [2021-11-29 23:32:39,632] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_1_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,633] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_1_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,634] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_31_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,639] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_15_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,641] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_29_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_21_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_27_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_21_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,644] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_27_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_24_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,645] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_13_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,646] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_26_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_30_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,647] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_18_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_2_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_3_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,651] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_5_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,652] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_6_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,653] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_9_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,655] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_8_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,656] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_17_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,657] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_18_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,658] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_28_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,659] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_15_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,660] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_20_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,661] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_13_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_2_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_28_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_14_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,666] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_22_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_22_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,668] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_19_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_11_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_6_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,669] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_16_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,672] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_7_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_25_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,673] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_23_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,679] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_25_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_29_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_10_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,682] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_23_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_30_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,683] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_24_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,685] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_10_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,686] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_7_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,687] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_4_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,688] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_26_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,689] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_19_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_3_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,692] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_31_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_12_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,693] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_5_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_20_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,695] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_16_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_11_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,696] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_17_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,698] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_14_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,701] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_0_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,703] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_4_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,704] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_12_mp_rank_00_optim_states.pt [2021-11-29 23:32:39,710] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_9_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,714] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_8_mp_rank_01_optim_states.pt [2021-11-29 23:32:39,725] [INFO] [engine.py:2540:_save_zero_checkpoint] zero checkpoint saved /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints/global_step152972/zero_pp_rank_0_mp_rank_00_optim_states.pt successfully saved checkpoint at iteration 152972 to /gpfsscratch/rech/six/commun/checkpoints/tr6g-1B3-oscar-loss-reweighting/checkpoints ------------------------------------------------------------------------------------------------------------ test loss at the end of training for test data | lm loss value: 1.391494E+00 | lm loss PPL: 4.020852E+00 | ------------------------------------------------------------------------------------------------------------